Converting indicator numbers to binary values

Question

if have two dataframes, (pandas.DataFrame), each looking as follows. Let's call the first one df_A

    code1   code2   code3   code4   code5   
0   1       4       2       0       0 
1   3       2       1       5       0   
2   2       3       0       0       0   

    has1    has2    has3    has4    has5
0   1       1       0       1       0              
1   1       1       0       0       1 
2   0       1       1       0       0

The objects(rows) are each given up to 5 codes shown by the five columns in the first df.

I instead want a binary representation of which codes each object has. As shown in the second df.

The functions in pandas or scikit-learn for dummy-values take into account which position the code is written in, this in unimportant.

The attempts I have with my own code have not worked due to my inexperience in python and pandas.

This case is different from others I have seen on stack overflow as all the columns represent the same thing.

Thank you!

Edit:

for colname in df_bin.columns:
    for row in range(len(df_codes)):
        if int(colname) in df_codes.iloc[[row]]:
            df_bin[colname][row]=1

This is one of the attempts I made so far.

Please post what you have so far.

Red
– Red

2020-06-29 14:56:30 +00:00
Commented Jun 29, 2020 at 14:56 — Red
– Red, Commented Jun 29, 2020 at 14:56

BENY · Accepted Answer · 2020-06-29 14:58:11Z

3

You can try stack then str.get_dummies

s=df.stack().loc[lambda x : x!=0].astype(str).str.get_dummies().sum(level=0).add_prefix('Has')
   Has1  Has2  Has3  Has4  Has5
0     1     1     0     1     0
1     1     1     1     0     1
2     0     1     1     0     0

answered Jun 29, 2020 at 14:58

BENY

324k22 gold badges176 silver badges250 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Professional_n00b Over a year ago

This, worked! I have seen the use of lamdba before but never understood it. I will look into it more! thank you very much

Quang Hoang · Accepted Answer · 2020-06-29 14:59:48Z

1

Let's try:

(df.stack().groupby(level=0)
   .value_counts()
   .unstack(fill_value=0)
   [range(1,6)]
   .add_prefix('has')
)

Output:

   has1  has2  has3  has4  has5
0     1     1     0     1     0
1     1     1     1     0     1
2     0     1     1     0     0

answered Jun 29, 2020 at 14:59

Quang Hoang

151k11 gold badges64 silver badges86 bronze badges

Comments

Scott Boston · Accepted Answer · 2020-06-29 15:54:23Z

0

Here's another way using pd.crosstab:

df_out = df.reset_index().melt('index')
df_out = pd.crosstab(df_out['index'], df_out['value']).drop(0, axis=1).add_prefix('has')

Output:

value  has1  has2  has3  has4  has5
index                              
0         1     1     0     1     0
1         1     1     1     0     1
2         0     1     1     0     0

edited Jun 29, 2020 at 15:54

answered Jun 29, 2020 at 15:16

Scott Boston

154k15 gold badges160 silver badges207 bronze badges

2 Comments

Professional_n00b Over a year ago

thank, you. I am unfamiliar with crosstab. The code seems to work, and the output df looks as expected.. the dataframe is of the right dimensions, but df_out.shape, is totally different. and i cannot acess the columns the way that i am used to. How is this new table structured, and how would i get the same result as I would usually with df_out["has1"]

Scott Boston Over a year ago

@Professional_n00b The output is different, because of the column header name ('value'). You can still access the dataframe. However we, need to assign the outputs of pd.crosstab back to df_out. I didn't do that in this solution. I will modify now. (I changed the answer to include the re-assignment back to df_out).

Collectives™ on Stack Overflow

Converting indicator numbers to binary values

3 Answers 3

1 Comment

Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

1 Comment

Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related