2

Have a pandas dataframe with string input columns. df looks like:

news                          label1      label2      label3  label4
COVID Hospitalizations ....   health
will pets contract covid....  health      pets
High temperature will cause.. health      weather
...

Expected output

news                          health      pets      weather  tech
COVID Hospitalizations ....   1           0         0        0 
will pets contract covid....  1           1         0        0
High temperature will cause.. 1           0         1        0
... 

Currently I used sklean

from sklearn.preprocessing import MultiLabelBinarizer
mlb = MultiLabelBinarizer()
df['labels'] = df[['label1','label2','label3','label4']].values.tolist()
mlb.fit(df['labels'])
temp = mlb.transform(df['labels'])
ff = pd.DataFrame(temp, columns = list(mlb.classes_))
df_final = pd.concat([df['news'],ff], axis=1)

this works so far. Just wondering if there is a way to avoid to use sklearn.preprocessing.MultiLabelBinarizer ?

1 Answer 1

2

One idea is join values by | and then use Series.str.get_dummies:

#if missing values NaNs
#df = df.fillna('')
df_final = df.set_index('news').agg('|'.join, 1).str.get_dummies().reset_index()
print (df_final)
                            news  health  pets  weather
0    COVID Hospitalizations ....       1     0        0
1   will pets contract covid....       1     1        0
2  High temperature will cause..       1     0        1

Or use get_dummies:

df_final = (pd.get_dummies(df.set_index('news'), prefix='', prefix_sep='')
              .groupby(level=0,axis=1)
              .max()
              .reset_index())

#second column name is empty string, so dfference with solution above
print (df_final)
                            news     health  pets  weather
0    COVID Hospitalizations ....  1       1     0        0
1   will pets contract covid....  1       1     1        0
2  High temperature will cause..  1       1     0        1
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.