1

Data frame--->with only columns ['234','apple','banana','orange'] now i have a list like

l=['apple', 'banana']

extracting from another data frame column I am taking unique values of columns from column fruits. fruits.unique() which results in array[()] to get the list of items simply looping over index values and store them in list

loop over the list to check whether the values in the list are presented in columns of data frame. If present,then add 1 for the values that match column headers else add 0 for one that matching. In the above case data frame after matching should look like:

234 apple banana orange

 0    1      1     0     
1
  • df[l] = 1 or df[l] += 1 ? It's not very clear what you are looking for. Commented Mar 4, 2022 at 6:10

1 Answer 1

2

If need one row DataFrame compare columns names converted to DataFrame by Index.to_frame with DataFrame.isin, then for mapping True, False to 1,0 convert to integers and transpose:

df = pd.DataFrame(columns=['234','apple','banana','orange'])
l=['apple', 'banana']

df = df.columns.to_frame().isin(l).astype(int).T
print (df)
   234  apple  banana  orange
0    0      1       1       0

If it is nested list use MultiLabelBinarizer:

df = pd.DataFrame(columns=['234','apple','banana','orange'])

L= [['apple', 'banana'], ['apple', 'orange', 'apple']]

from sklearn.preprocessing import MultiLabelBinarizer

mlb = MultiLabelBinarizer()
df = (pd.DataFrame(mlb.fit_transform(L),columns=mlb.classes_)
        .reindex(df.columns, fill_value=0, axis=1))
print (df)
   234  apple  banana  orange
0    0      1       1       0
1    0      1       0       1

EDIT: If data are from another DataFrame column solution is very similar like second one:

df = pd.DataFrame(columns=['234','apple','banana','orange'])

df1 = pd.DataFrame({"col":[['apple', 'banana'],['apple', 'orange', 'apple']]})
print (df1)
                      col
0         [apple, banana]
1  [apple, orange, apple]

from sklearn.preprocessing import MultiLabelBinarizer

mlb = MultiLabelBinarizer()
df = (pd.DataFrame(mlb.fit_transform(df1['col']),columns=mlb.classes_)
         .reindex(df.columns, fill_value=0, axis=1))
print (df)
   234  apple  banana  orange
0    0      1       1       0
1    0      1       0       1
Sign up to request clarification or add additional context in comments.

5 Comments

How to append data to df for n number of lists??df = df.columns.to_frame().isin(l).astype(int).T since this is series
@Swetha - what means n number of lists ? It is L from second sample code?
@Swetha it is l=['apple', 'banana'], l1=['apple', 'banana','apple'] ? then use L = [l, l1] and use second solution
l=['banana'] l=['apple','234'] l=;'234'] like looping over every list and print appropriate 0 or 1 to data frame and result would be dataframe with how many lists of data we send.In the above case df should look like: 234 apple banana orange 0 0 1 0 |||1 1 0 0||| 1 0 0 0
@Swetha - answer was edited.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.