0

I have a Pandas DataFrame of the form:

Current

product_typ

[Milo, Milk, Sugar]
[Water, Tea, Milo]
[Bread, Water]
[Bread, Water, Milo]
[Salt, Water, Milo]
[Milo, Milk, Water, Bread]
[Salt, Milk, Bread]
[Milo, Milk]

I would like to create a new column with regex of the form. Keep in mind that it is a Pandas DataFrame

Expected Output

product_typ                          matched_col

[Milo, Milk, Sugar]                Product_Milo_Milk_Sugar
[Water, Tea, Milo]                 Product_Water_Tea_Milo
[Bread, Water]                     Product_Bread_Water
[Bread, Water, Milo]               Product_Bread_Water_Milo
[Salt, Water, Milo]                Product_Salt_Water_Milo
[Milo, Milk, Water, Bread]         Product_Milo_Milk_Water_Bread
[Salt, Milk, Bread]                Product_Salt_Milk_Bread
[Milo, Milk]                       Product_Milo_Milk

I tried to attempt this with str.findall matching the pattern works but the replacement got me quite thinking.

1
  • What have you tried so far? Commented Apr 20, 2020 at 15:41

1 Answer 1

2

Like this maybe:

df['matched_col'] = ['_'.join(map(str, l)) for l in df['product_typ']]

OR

In [1687]: df['matched_col'] = df['product_typ'].apply('_'.join)

Example:

In [1681]: df = pd.DataFrame({'A': [['a','b','c'], ['b','c']]})                                                                                                                                             

In [1682]: df                                                                                                                                                                                               
Out[1682]: 
           A
0  [a, b, c]
1     [b, c]

In [1684]: df['b'] = ['_'.join(map(str, l)) for l in df['A']]                                                                                                                                               

In [1685]: df                                                                                                                                                                                               
Out[1685]: 
           A      b
0  [a, b, c]  a_b_c
1     [b, c]    b_c
Sign up to request clarification or add additional context in comments.

2 Comments

I get an error TypeError: sequence item 0: expected str instance, float found and when I check the first item, this is it --> ['Milk', 'Milo']
Please apply this only on the column which has lists not on the entire dataframes. Another reason could be that this column contains float values. Try to apply this on a sample of dataframe which you are sure that it contains proper list of string values.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.