Create dummy variable of multiple columns with python

Question

I am working with a dataframe containing two columns with ID numbers. For further research I want to make a sort of dummy variables of these ID numbers (with the two ID numbers). My code, however, does not merge the columns from the two dataframes. How can I merge the columns from the two dataframes and create the dummy variables?

Dataframe

import pandas as pd
import numpy as np
d = {'ID1': [1,2,3], 'ID2': [2,3,4]}
df = pd.DataFrame(data=d)

Current code

pd.get_dummies(df, prefix = ['ID1', 'ID2'], columns=['ID1', 'ID2'])

Desired output

p = {'1': [1,0,0], '2': [1,1,0], '3': [0,1,1], '4': [0,0,1]}
df2 = pd.DataFrame(data=p)
df2

jezrael · Accepted Answer · 2019-03-15 12:51:07Z

2

If need indicators in output use max, if need count values use sum after get_dummies with another parameters and casting values to strings:

df = pd.get_dummies(df.astype(str), prefix='', prefix_sep='').max(level=0, axis=1)
#count alternative 
#df = pd.get_dummies(df.astype(str), prefix='', prefix_sep='').sum(level=0, axis=1)
print (df)
   1  2  3  4
0  1  1  0  0
1  0  1  1  0
2  0  0  1  1

edited Mar 15, 2019 at 12:51

answered Mar 15, 2019 at 12:45

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

Comments

cs95 · Accepted Answer · 2019-03-15 12:45:03Z

2

Different ways of skinning a cat; here's how I'd do it—use an additional groupby:

# pd.get_dummies(df.astype(str)).groupby(lambda x: x.split('_')[1], axis=1).sum()
pd.get_dummies(df.astype(str)).groupby(lambda x: x.split('_')[1], axis=1).max()

   1  2  3  4
0  1  1  0  0
1  0  1  1  0
2  0  0  1  1

Another option is stacking, if you like conciseness:

# pd.get_dummies(df.stack()).sum(level=0)
pd.get_dummies(df.stack()).max(level=0)

   1  2  3  4
0  1  1  0  0
1  0  1  1  0
2  0  0  1  1

answered Mar 15, 2019 at 12:45

cs95

406k106 gold badges744 silver badges797 bronze badges

Collectives™ on Stack Overflow

Create dummy variable of multiple columns with python

2 Answers 2

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related