Here's a way to do this.
First call df['Label'].apply() to replace the csv strings with lists and also to populate a Python dict mapping labels to new column index values.
Then create a second data frame df2 that fills new label columns as specified in the question.
Finally, concatenate the two DataFrames horizontally and drop the 'Label' column.
import pandas as pd
import numpy as np
df = pd.DataFrame({
'ID' : [0,1,2],
'Label' : ['apple, tom, car', 'apple, car', 'tom, apple']
})
labelInfo = [labels := {}, curLabelIdx := 0]
def foo(x, labelInfo):
theseLabels = [s.strip() for s in x.split(',')]
labels, curLabelIdx = labelInfo
for label in theseLabels:
if label not in labels:
labels[label] = curLabelIdx
curLabelIdx += 1
labelInfo[1] = curLabelIdx
return theseLabels
df['Label'] = df['Label'].apply(foo, labelInfo=labelInfo)
df2 = pd.DataFrame(np.array(df['Label'].apply(lambda x: [s if s in x else 'None' for s in labels]).to_list()),
columns = list(labels.values()))
df = pd.concat([df, df2], axis=1).drop(columns=['Label'])
print(df)
Output:
ID 0 1 2
0 0 apple tom car
1 1 apple None car
2 2 apple tom None
If you'd prefer to have the new columns named using the labels they contain, you can replace the df2 assignment line with this:
df2 = pd.DataFrame(np.array(df['Label'].apply(lambda x: [s if s in x else 'None' for s in labels]).to_list()),
columns = list(labels))
Now the output is:
ID apple tom car
0 0 apple tom car
1 1 apple None car
2 2 apple tom None