0

I have the following Python DataFrame:

| ColumnA | File            |
| -------- | -------------- |
| First    | aasdkh.xls     |
| Second   | sadkhZ.xls     |
| Third    | asdasdPH.xls   |
| Fourth   | adsjklahsd.xls |

and so on.

I'm trying to get the following DataFrame:

| ColumnA | File              | Category|
| -------- | ---------------- | ------- |
| First    | aasdkh.xls       | N       |
| Second   | sadkhZ.xls       | Z       |
| Third    | asdasdPH.xls     | PH      |
| Fourth   | adsjklahsdPH.xls | PH      |

I'm trying to use regex expresions, but I'm not sure how to use them. I need to get a new column that "extracts" the category of the file; N if is a "normal" file (no category), Z if the file contains a "Z" just before the extension and PH if the file contains a "PH" before the extension.

I defined the following regex expresions that I think I could use, but I dont know how to use them:

    regex_Z = re.compile('Z.xls$')    
    regex_PH = re.compile('PH.xls$')

PD: Could you recomend me any website to learn how to use the regex expresions?

1 Answer 1

2

Let's try

df['Category']  = df['File'].str.extract('(Z|PH)\.xls$').fillna('N')
print(df)

  ColumnA            File Category
0   First      aasdkh.xls        N
1  Second      sadkhZ.xls        Z
2   Third    asdasdPH.xls       PH
3  Fourth  adsjklahsd.xls        N
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.