Creating a dataframe with binary valued columns with pandas using values from an existing dataframe

Question

I am trying to create a new dataframe with binary (0 or 1) values from an exisitng dataframe. For every row in the given dataframe, the program should take value from each cell and set 1 for the corresponding columns of the row indexed with same number in the new dataframe

I have tried executing the following code snippet.

for col in products :
    index = 0;
    for item in products.loc[col] :
        products_coded.ix[index, 'prod_' + str(item)] = 1;
        index = index + 1;

It works for less number of rows. But,it takes lot of time for any large dataset. What could be the best way to get the desired outcome.

jezrael · Accepted Answer · 2018-02-02 06:20:48Z

1

I think you need:

first get_dummies with casting values to strings
aggregate max by columns names max
for correct ordering convert columns to int
reindex for ordering and append missing columns, replace NaNs by 0 by parameter fill_value=0 and remove first 0 column
add_prefix for rename columns

df = pd.DataFrame({'B':[3,1,12,12,8],
                   'C':[0,6,0,14,0],
                   'D':[0,14,0,0,0]})

print (df)
    B   C   D
0   3   0   0
1   1   6  14
2  12   0   0
3  12  14   0
4   8   0   0

df1 = (pd.get_dummies(df.astype(str), prefix='', prefix_sep='')
         .max(level=0, axis=1)
         .rename(columns=lambda x: int(x))
         .reindex(columns=range(1, df.values.max() + 1), fill_value=0)
         .add_prefix('prod_'))
print (df1)

   prod_1  prod_2  prod_3  prod_4  prod_5  prod_6  prod_7  prod_8  prod_9  \
0       0       0       1       0       0       0       0       0       0   
1       1       0       0       0       0       1       0       0       0   
2       0       0       0       0       0       0       0       0       0   
3       0       0       0       0       0       0       0       0       0   
4       0       0       0       0       0       0       0       1       0   

   prod_10  prod_11  prod_12  prod_13  prod_14  
0        0        0        0        0        0  
1        0        0        0        0        1  
2        0        0        1        0        0  
3        0        0        1        0        1  
4        0        0        0        0        0

Another similar solution:

df1 = (pd.get_dummies(df.astype(str), prefix='', prefix_sep='')
         .max(level=0, axis=1))

df1.columns = df1.columns.astype(int)    

df1 = (df1.reindex(columns=range(1, df1.columns.max() + 1), fill_value=0)
          .add_prefix('prod_'))

edited Feb 2, 2018 at 6:20

answered Feb 1, 2018 at 14:34

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Subhanandh Over a year ago

I get the following error: TypeError: reindex() got an unexpected keyword argument "axis"

jezrael Over a year ago

It is oldier version of pandas, only need .reindex(columns=range(1, df1.columns.max() + 1), fill_value=0) instead .reindex(range(1, df1.columns.max() + 1), axis=1, fill_value=0)

Collectives™ on Stack Overflow

Creating a dataframe with binary valued columns with pandas using values from an existing dataframe

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related