How to extract information from pandas dataframe column

Question

I have the dataframe below and I want to extract some information from column A and then create other columns to add them based on their types. Below is an example to illustrate this.

In [0]: df
Out[0]: 
          A                  
0 1258GA 25/01/20 TABLE 090626  038272
1 GOODIES 762088 A714816
2 TABLE AA88547 734963 GOODIES
3 WATER 02/450 FROM TOMORROW 48246
4 02H12 ALSCA 00548246B GOODIES

And I want to have the result below.

In [1]: df
Out[1]: 
          A                               Category             Date      Hour
0 1258GA 25/01/20 TABLE 090626  038272    TABLE           25/01/20
1 GOODIES 762088 A714816                  GOODIES 
2 TABLE AA88547 734963 GOODIES            TABLE GOODIES
3 WATER 02/450 FROM TOMORROW 48246        WATER 
4 02H12 ALSCA 00548246B GOODIES           GOODIES                        02H12

I've tried many things but haven't got that result

for row 3, y is it not WATER FROM TOMORROW? y is it just WATER? same for row 4? — sammywemmy
– sammywemmy, Commented Feb 19, 2020 at 9:16
It's just an axample to explain what I want to have finally. And if I know how to do that with that example, I'm going to apply that on my real data — hitech
– hitech, Commented Feb 19, 2020 at 9:23
That doesn't explain why FROM TOMORROW is missing from your expected output. — Henry Yik
– Henry Yik, Commented Feb 19, 2020 at 9:24
How could this be upvoted? You should not let readers guess what you want from a simple example. You should instead first specify the requirement, and then illustrate with an example. Here you missed first point :-( — Serge Ballesta
– Serge Ballesta, Commented Feb 19, 2020 at 9:24

gbruenjes · Accepted Answer · 2020-02-19 09:23:11Z

1

Maybe this helps:

df['A'].str.findall(r'\b[A-Z]+\b').str.join(' ')

0                  TABLE
1                GOODIES
2          TABLE GOODIES
3    WATER FROM TOMORROW
4          ALSCA GOODIES

edited Feb 19, 2020 at 9:23

answered Feb 19, 2020 at 9:21

gbruenjes

4,2251 gold badge18 silver badges32 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

yatu Over a year ago

Maybe join the resulting list into strings?

jezrael Over a year ago

df['A'].str.findall(r'\b[A-Z]+\b').str.join(' ')

Shubham Sharma · Accepted Answer · 2020-02-19 09:41:10Z

0

You can certainly do that using Series.str methods,

The `Series.str.extract()` returns:

Extract capture groups in the regex pat as columns in a DataFrame.

For each subject string in the Series, extract groups from the first match of regular expression pat.

The Series.str.findall() returns:

Find all occurrences of pattern or regular expression in the Series/Index.

Here is the code snippet,

EDIT:

df["Category"] = df['A'].str.findall(r"(\b[A-Za-z]+\b)").str.join(' ')
df["Date"] = df['A'].str.extract(r"(\b[0-9]+/[0-9]+/[0-9]+\b)")
df["Hour"] = df['A'].str.extract(r"(\b[0-9]+H[0-9]+\b)")

And output will be,

                                      A             Category      Date   Hour
0  1258GA 25/01/20 TABLE 090626  038272                TABLE  25/01/20    NaN
1                GOODIES 762088 A714816              GOODIES       NaN    NaN
2          TABLE AA88547 734963 GOODIES        TABLE GOODIES       NaN    NaN
3      WATER 02/450 FROM TOMORROW 48246  WATER FROM TOMORROW       NaN    NaN
4         02H12 ALSCA 00548246B GOODIES        ALSCA GOODIES       NaN  02H12

edited Feb 19, 2020 at 9:41

answered Feb 19, 2020 at 9:25

Shubham Sharma

71.8k6 gold badges26 silver badges58 bronze badges

3 Comments

hitech Over a year ago

Is it possible to have, in Category, line n°2, TABLE and GOODIES with your code?

Shubham Sharma Over a year ago

What do you mean category in line 2, could you plz explain in details.

hitech Over a year ago

I mean, in the 2nd ligne of the column Category and that what I've done by editing your code. Thanks

Collectives™ on Stack Overflow

How to extract information from pandas dataframe column

2 Answers 2

2 Comments

The `Series.str.extract()` returns:

The Series.str.findall() returns:

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

The Series.str.extract() returns:

The Series.str.findall() returns:

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related

The `Series.str.extract()` returns: