0

I have the dataframe below and I want to extract some information from column A and then create other columns to add them based on their types. Below is an example to illustrate this.

In [0]: df
Out[0]: 
          A                  
0 1258GA 25/01/20 TABLE 090626  038272
1 GOODIES 762088 A714816
2 TABLE AA88547 734963 GOODIES
3 WATER 02/450 FROM TOMORROW 48246
4 02H12 ALSCA 00548246B GOODIES

And I want to have the result below.

In [1]: df
Out[1]: 
          A                               Category             Date      Hour
0 1258GA 25/01/20 TABLE 090626  038272    TABLE           25/01/20
1 GOODIES 762088 A714816                  GOODIES 
2 TABLE AA88547 734963 GOODIES            TABLE GOODIES
3 WATER 02/450 FROM TOMORROW 48246        WATER 
4 02H12 ALSCA 00548246B GOODIES           GOODIES                        02H12

I've tried many things but haven't got that result

7
  • 5
    for row 3, y is it not WATER FROM TOMORROW? y is it just WATER? same for row 4? Commented Feb 19, 2020 at 9:16
  • It's just an axample to explain what I want to have finally. And if I know how to do that with that example, I'm going to apply that on my real data Commented Feb 19, 2020 at 9:23
  • but the example has no comprehensible logic Commented Feb 19, 2020 at 9:24
  • That doesn't explain why FROM TOMORROW is missing from your expected output. Commented Feb 19, 2020 at 9:24
  • 2
    How could this be upvoted? You should not let readers guess what you want from a simple example. You should instead first specify the requirement, and then illustrate with an example. Here you missed first point :-( Commented Feb 19, 2020 at 9:24

2 Answers 2

1

Maybe this helps:

df['A'].str.findall(r'\b[A-Z]+\b').str.join(' ')

0                  TABLE
1                GOODIES
2          TABLE GOODIES
3    WATER FROM TOMORROW
4          ALSCA GOODIES
Sign up to request clarification or add additional context in comments.

2 Comments

Maybe join the resulting list into strings?
df['A'].str.findall(r'\b[A-Z]+\b').str.join(' ')
0

You can certainly do that using Series.str methods,

The Series.str.extract() returns:

Extract capture groups in the regex pat as columns in a DataFrame.

For each subject string in the Series, extract groups from the first match of regular expression pat.


The Series.str.findall() returns:

Find all occurrences of pattern or regular expression in the Series/Index.

Here is the code snippet,

EDIT:

df["Category"] = df['A'].str.findall(r"(\b[A-Za-z]+\b)").str.join(' ')
df["Date"] = df['A'].str.extract(r"(\b[0-9]+/[0-9]+/[0-9]+\b)")
df["Hour"] = df['A'].str.extract(r"(\b[0-9]+H[0-9]+\b)")

And output will be,

                                      A             Category      Date   Hour
0  1258GA 25/01/20 TABLE 090626  038272                TABLE  25/01/20    NaN
1                GOODIES 762088 A714816              GOODIES       NaN    NaN
2          TABLE AA88547 734963 GOODIES        TABLE GOODIES       NaN    NaN
3      WATER 02/450 FROM TOMORROW 48246  WATER FROM TOMORROW       NaN    NaN
4         02H12 ALSCA 00548246B GOODIES        ALSCA GOODIES       NaN  02H12

3 Comments

Is it possible to have, in Category, line n°2, TABLE and GOODIES with your code?
What do you mean category in line 2, could you plz explain in details.
I mean, in the 2nd ligne of the column Category and that what I've done by editing your code. Thanks

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.