Adding rows based on column value

Question

Data frame--->with only columns ['234','apple','banana','orange'] now i have a list like

l=['apple', 'banana']

extracting from another data frame column I am taking unique values of columns from column fruits. fruits.unique() which results in array[()] to get the list of items simply looping over index values and store them in list

loop over the list to check whether the values in the list are presented in columns of data frame. If present,then add 1 for the values that match column headers else add 0 for one that matching. In the above case data frame after matching should look like:

234 apple banana orange

 0    1      1     0

df[l] = 1 or df[l] += 1 ? It's not very clear what you are looking for. — akuiper
– akuiper, Commented Mar 4, 2022 at 6:10

jezrael · Accepted Answer · 2022-03-04 07:39:39Z

2

If need one row DataFrame compare columns names converted to DataFrame by Index.to_frame with DataFrame.isin, then for mapping True, False to 1,0 convert to integers and transpose:

df = pd.DataFrame(columns=['234','apple','banana','orange'])
l=['apple', 'banana']

df = df.columns.to_frame().isin(l).astype(int).T
print (df)
   234  apple  banana  orange
0    0      1       1       0

If it is nested list use MultiLabelBinarizer:

df = pd.DataFrame(columns=['234','apple','banana','orange'])

L= [['apple', 'banana'], ['apple', 'orange', 'apple']]

from sklearn.preprocessing import MultiLabelBinarizer

mlb = MultiLabelBinarizer()
df = (pd.DataFrame(mlb.fit_transform(L),columns=mlb.classes_)
        .reindex(df.columns, fill_value=0, axis=1))
print (df)
   234  apple  banana  orange
0    0      1       1       0
1    0      1       0       1

EDIT: If data are from another DataFrame column solution is very similar like second one:

df = pd.DataFrame(columns=['234','apple','banana','orange'])

df1 = pd.DataFrame({"col":[['apple', 'banana'],['apple', 'orange', 'apple']]})
print (df1)
                      col
0         [apple, banana]
1  [apple, orange, apple]

from sklearn.preprocessing import MultiLabelBinarizer

mlb = MultiLabelBinarizer()
df = (pd.DataFrame(mlb.fit_transform(df1['col']),columns=mlb.classes_)
         .reindex(df.columns, fill_value=0, axis=1))
print (df)
   234  apple  banana  orange
0    0      1       1       0
1    0      1       0       1

edited Mar 4, 2022 at 7:39

answered Mar 4, 2022 at 6:11

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

Swetha Over a year ago

How to append data to df for n number of lists??df = df.columns.to_frame().isin(l).astype(int).T since this is series

jezrael Over a year ago

@Swetha - what means n number of lists ? It is L from second sample code?

jezrael Over a year ago

@Swetha it is l=['apple', 'banana'], l1=['apple', 'banana','apple'] ? then use L = [l, l1] and use second solution

Swetha Over a year ago

l=['banana'] l=['apple','234'] l=;'234'] like looping over every list and print appropriate 0 or 1 to data frame and result would be dataframe with how many lists of data we send.In the above case df should look like: 234 apple banana orange 0 0 1 0 |||1 1 0 0||| 1 0 0 0

jezrael Over a year ago

@Swetha - answer was edited.

Collectives™ on Stack Overflow

Adding rows based on column value

1 Answer 1

5 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

5 Comments

Your Answer

Sign up or log in

Post as a guest

Related