Pandas: create new dataframe from data extracted from string in old dataframe

Question

Here is a dataframe with sample data:

df = pd.DataFrame({'KEY': ['1','2','3'], 'RECORD': ['1','1','1'], 'SERIAL': ['1470','2321','300'], 'REMARKS': ['FRUIT[APPLES,ORANGES,PEARS] IS HEALTHY FOR YOU','I LIKE FRUIT[BANANAS,CHERRIES,GRAPES], BUT I DON\'T LIKE FRUIT[CANTALOPE,HONEYDEW]', 'THERE IS FRUIT[LEMONS,ORANGES,GRAPEFRUIT] @ 1234']})

I need to extract out the fruit into a new dataframe associated with the KEY, RECORD, and SERIAL. It should look like this when finished:

df = pd.DataFrame({'KEY': ['1','1','1','2','2','2','2','2','3','3','3'], 'RECORD': ['1','1','1','1','1','1','1','1','1','1','1'], 'SERIAL': ['1470','1470','1470','2321','2321','2321','2321','2321','300','300','300'], 'FRUIT': ['APPLES','ORANGES','PEARS','BANANAS','CHERRIES','GRAPES','CANTALOPE','HONEYDEW','LEMONS','ORANGES','GRAPEFRUIT'], 'CODE': ['null','null','null','null','null','null','null','null','1234','1234','1234']})

From the research I've done, it looks like I could use the str.split and/or str.extract, but I'm not sure how to match up each fruit with the KEY, RECORD, and SERIAL. On top of that, the last record has "@ 1234". That information needs to also be extracted and matched up with the 3 fruits listed before it.

I'm guessing the first step in this process is to extract out the fruit, which should be easy because they are all in a series in the string.

Any recommendations on how to tackle this?

Thanks!

Scott Boston · Accepted Answer · 2021-02-02 17:35:35Z

2

Try this:

df['FruitList'] = df['REMARKS'].str.extract('\[(.+?)\]').squeeze().str.split(',')
df['CODE'] = df['REMARKS'].str.extract('@\s(\d+)')
df.explode('FruitList')

Output:

  KEY RECORD SERIAL                                            REMARKS   FruitList  CODE
0   1      1   1470     FRUIT[APPLES,ORANGES,PEARS] IS HEALTHY FOR YOU      APPLES   NaN
0   1      1   1470     FRUIT[APPLES,ORANGES,PEARS] IS HEALTHY FOR YOU     ORANGES   NaN
0   1      1   1470     FRUIT[APPLES,ORANGES,PEARS] IS HEALTHY FOR YOU       PEARS   NaN
1   2      1   2321  I LIKE FRUIT[BANANAS,CHERRIES,GRAPES], BUT I D...     BANANAS   NaN
1   2      1   2321  I LIKE FRUIT[BANANAS,CHERRIES,GRAPES], BUT I D...    CHERRIES   NaN
1   2      1   2321  I LIKE FRUIT[BANANAS,CHERRIES,GRAPES], BUT I D...      GRAPES   NaN
2   3      1    300   THERE IS FRUIT[LEMONS,ORANGES,GRAPEFRUIT] @ 1234      LEMONS  1234
2   3      1    300   THERE IS FRUIT[LEMONS,ORANGES,GRAPEFRUIT] @ 1234     ORANGES  1234
2   3      1    300   THERE IS FRUIT[LEMONS,ORANGES,GRAPEFRUIT] @ 1234  GRAPEFRUIT  1234

And you can drop REMARKS if you would like:

df.explode('FruitList').drop('REMARKS', axis=1))

Output:

  KEY RECORD SERIAL   FruitList  CODE
0   1      1   1470      APPLES   NaN
0   1      1   1470     ORANGES   NaN
0   1      1   1470       PEARS   NaN
1   2      1   2321     BANANAS   NaN
1   2      1   2321    CHERRIES   NaN
1   2      1   2321      GRAPES   NaN
2   3      1    300      LEMONS  1234
2   3      1    300     ORANGES  1234
2   3      1    300  GRAPEFRUIT  1234

answered Feb 2, 2021 at 17:35

Scott Boston

154k15 gold badges160 silver badges207 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Heather Over a year ago

Thank you for this. It gave me a starting point. My only issue is that the brackets are coming across to the new column, which is keeping the exploding from working. However, at least I now know about squeezing/exploding.

Heather Over a year ago

From my testing, the split and squeezing are working fine. It is the exploding that is not working. All it returns is the fruit list split apart like this: [APPLES, ORANGES, PEARS] still all on the same line.

Heather Over a year ago

I got it finally. I had to reference the explode statement back to df then it worked (df = df.explode('FruitList'))

Collectives™ on Stack Overflow

Pandas: create new dataframe from data extracted from string in old dataframe

1 Answer 1

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related