Here is a dataframe with sample data:
df = pd.DataFrame({'KEY': ['1','2','3'], 'RECORD': ['1','1','1'], 'SERIAL': ['1470','2321','300'], 'REMARKS': ['FRUIT[APPLES,ORANGES,PEARS] IS HEALTHY FOR YOU','I LIKE FRUIT[BANANAS,CHERRIES,GRAPES], BUT I DON\'T LIKE FRUIT[CANTALOPE,HONEYDEW]', 'THERE IS FRUIT[LEMONS,ORANGES,GRAPEFRUIT] @ 1234']})
I need to extract out the fruit into a new dataframe associated with the KEY, RECORD, and SERIAL. It should look like this when finished:
df = pd.DataFrame({'KEY': ['1','1','1','2','2','2','2','2','3','3','3'], 'RECORD': ['1','1','1','1','1','1','1','1','1','1','1'], 'SERIAL': ['1470','1470','1470','2321','2321','2321','2321','2321','300','300','300'], 'FRUIT': ['APPLES','ORANGES','PEARS','BANANAS','CHERRIES','GRAPES','CANTALOPE','HONEYDEW','LEMONS','ORANGES','GRAPEFRUIT'], 'CODE': ['null','null','null','null','null','null','null','null','1234','1234','1234']})
From the research I've done, it looks like I could use the str.split and/or str.extract, but I'm not sure how to match up each fruit with the KEY, RECORD, and SERIAL. On top of that, the last record has "@ 1234". That information needs to also be extracted and matched up with the 3 fruits listed before it.
I'm guessing the first step in this process is to extract out the fruit, which should be easy because they are all in a series in the string.
Any recommendations on how to tackle this?
Thanks!

