Using regex expresion to create a new Dataframe Column

Question

I have the following Python DataFrame:

| ColumnA | File            |
| -------- | -------------- |
| First    | aasdkh.xls     |
| Second   | sadkhZ.xls     |
| Third    | asdasdPH.xls   |
| Fourth   | adsjklahsd.xls |

and so on.

I'm trying to get the following DataFrame:

| ColumnA | File              | Category|
| -------- | ---------------- | ------- |
| First    | aasdkh.xls       | N       |
| Second   | sadkhZ.xls       | Z       |
| Third    | asdasdPH.xls     | PH      |
| Fourth   | adsjklahsdPH.xls | PH      |

I'm trying to use regex expresions, but I'm not sure how to use them. I need to get a new column that "extracts" the category of the file; N if is a "normal" file (no category), Z if the file contains a "Z" just before the extension and PH if the file contains a "PH" before the extension.

I defined the following regex expresions that I think I could use, but I dont know how to use them:

    regex_Z = re.compile('Z.xls$')    
    regex_PH = re.compile('PH.xls$')

PD: Could you recomend me any website to learn how to use the regex expresions?

Ynjxsjmh · Accepted Answer · 2022-10-20 15:03:05Z

2

Let's try

df['Category']  = df['File'].str.extract('(Z|PH)\.xls$').fillna('N')

print(df)

  ColumnA            File Category
0   First      aasdkh.xls        N
1  Second      sadkhZ.xls        Z
2   Third    asdasdPH.xls       PH
3  Fourth  adsjklahsd.xls        N

answered Oct 20, 2022 at 15:03

Ynjxsjmh

30.3k7 gold badges43 silver badges64 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Using regex expresion to create a new Dataframe Column

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related