Have a pandas dataframe with string input columns. df looks like:
news label1 label2 label3 label4
COVID Hospitalizations .... health
will pets contract covid.... health pets
High temperature will cause.. health weather
...
Expected output
news health pets weather tech
COVID Hospitalizations .... 1 0 0 0
will pets contract covid.... 1 1 0 0
High temperature will cause.. 1 0 1 0
...
Currently I used sklean
from sklearn.preprocessing import MultiLabelBinarizer
mlb = MultiLabelBinarizer()
df['labels'] = df[['label1','label2','label3','label4']].values.tolist()
mlb.fit(df['labels'])
temp = mlb.transform(df['labels'])
ff = pd.DataFrame(temp, columns = list(mlb.classes_))
df_final = pd.concat([df['news'],ff], axis=1)
this works so far.
Just wondering if there is a way to avoid to use sklearn.preprocessing.MultiLabelBinarizer ?