1

So I have a pandas df as follows and my goal is to take the MATCHUP column and make it several more dummy columns.

INDICATOR MATCHUP 
1         [   "APPLE",   "GRAPE" ]
1         [   "APPLE",   "GRAPE" ]
0         [   "GRAPE",   "BANANA" ]
0         [   "PEAR",   "ORANGE" ]
1         [   "ORANGE",   "APPLE" ]

Here's a dict of how it looks:

{'INDICATOR': [1, 1, 0, 0, 1],
 'MATCHUP': ['[   "APPLE",   "GRAPE" ]',
  '[   "APPLE",   "GRAPE" ]',
  '[   "GRAPE",   "BANANA" ]',
  '[   "PEAR",   "ORANGE" ]',
  '[   "ORANGE",   "APPLE" ]']}

So given this df, I would like to create some dummy variables to identify if a value appears in the MATCHUP.

Final outcome:

INDICATOR MATCHUP                    APPLE GRAPE BANANA PEAR ORANGE
1         [   "APPLE",   "GRAPE" ]   1     1     0      0    0 
1         [   "APPLE",   "GRAPE" ]   1     1     0      0    0
0         [   "GRAPE",   "BANANA" ]  0     1     1      0    0
0         [   "PEAR",   "ORANGE" ]   0     0     0      1    1
1         [   "ORANGE",   "APPLE" ]  1     0     0      0    1

Is there a way to accomplish this using pandas? I attempted to accomplish this using this but I think the spacing in the MATCHUP column make this method unviable.

1 Answer 1

3

Check explode with str.get_dummies

import ast
df = df.join(df['MATCHUP'].map(ast.literal_eval).explode().str.get_dummies().groupby(level=0).sum())
Sign up to request clarification or add additional context in comments.

2 Comments

This unfortunately did not work; It did create the dummy variables, but did not split the MATCHUP. Instead of Apple as Variable1 and Grape as Variable2, this returned [ "APPLE", "GRAPE" ] as its own Variable
@JohnThomas check the update , you need first convert the string back to list

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.