Split Pandas column of lists into multiple columns based on value

Question

I have a Pandas Dataframe that has a list column. I'd like to split this list column into multiple column based on the value. Returning yes_value or no_value for each record based on the column name.

Example input:

id | values
---|----------
1  | [A,B,C,D]
2  | [D,E,F]
3  | [A,D]
4  | [K]

Expected output:

id | values   |  A    |   B   |   C   |   D   |   E   |   F   |    K  |
---|----------|-------|-------|-------|-------|-------|-------|-------|
1  | [A,B,C,D]| yes_A | yes_B | yes_C | yes_D |  no_E |  no_F |  no_K |
2  | [D,E,F]  | no_A  | no_B  | no_C  | yes_D | yes_E | yes_F |  no_K |
3  | [A,D]    | yes_A | no_B  | no_C  | yes_D |  no_E |  no_F |  no_K | 
4  | [K]      | no_A  | no_B  | no_C  |  no_D |  no_E |  no_F | yes_K |

mozway · Accepted Answer · 2022-10-10 13:51:35Z

2

You can use a crosstab to reshape:

df2 = df.explode('values')
df3 = pd.crosstab(df2['id'], df2['values']).replace({0: 'no_', 1: 'yes_'})

out = df.merge(df3.add(df3.columns), left_on='id', right_index=True)

Or str.get_dummies:

df2 = df['values'].agg('|'.join).str.get_dummies().replace({0: 'no_', 1: 'yes_'})
out = df.join(df2.add(df2.columns))

output:

   id        values      A      B      C      D      E      F      K
0   1  [A, B, C, D]  yes_A  yes_B  yes_C  yes_D   no_E   no_F   no_K
1   2     [D, E, F]   no_A   no_B   no_C  yes_D  yes_E  yes_F   no_K
2   3        [A, D]  yes_A   no_B   no_C  yes_D   no_E   no_F   no_K
3   4           [K]   no_A   no_B   no_C   no_D   no_E   no_F  yes_K

answered Oct 10, 2022 at 13:51

mozway

267k13 gold badges56 silver badges106 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

yona bendelac Over a year ago

in the get_dummies way. why you need the agg? df['values'].str.get_dummies(',').replace({0: 'no_', 1: 'yes_'}) is not working?

mozway Over a year ago

@yona because the input is a list and str.get_dummies needs a string. Give it a try, this would convert the whole list as string messing the logic ;)

yona bendelac Over a year ago

you right. i didn't read that is a list column and not a string

R. Baraiya · Accepted Answer · 2022-10-10 15:04:55Z

Code:

#Input
df = pd.DataFrame({'id':[1,2,3,4], 'Values':[['A','B','C','D'], ['D','E','F'], ['A','D'],  ['K']]})

#STEP 1 merging the Values column all lists into one and findout the unique values using set
#so the output will be {'A', 'B', 'C', 'D', 'E', 'F', 'K'} and then looping on it as below
for i in sorted(set(sum(df['Values'].tolist(),[]))):  

    #STEP 2 Creating new column and check if column in list or not
    df[i] = df['Values'].apply(lambda x: f'yes_{i}' if i in x else f'no_{i}')
df

Output:

    id  Values          A       B       C       D       E       F    K
0   1   [A, B, C, D]    yes_A   yes_B   yes_C   yes_D   no_E    no_F    no_K
1   2   [D, E, F]       no_A    no_B    no_C    yes_D   yes_E   yes_F   no_K
2   3   [A, D]          yes_A   no_B    no_C    yes_D   no_E    no_F    no_K
3   4   [K]             no_A    no_B    no_C    no_D    no_E    no_F    yes_K

G.G · Accepted Answer · 2023-02-17 06:28:15Z

0

def function1(ss:pd.Series):
    ss1=np.where(ss.eq(1),"yes_"+ss.index,"no_"+ss.index)
    return pd.Series(ss1,index=ss.index)

df1=df.Values.map("-".join).str.get_dummies(sep="-").apply(function1,axis=1)
df.join(df1)

output:

   id        values      A      B      C      D      E      F      K
0   1  [A, B, C, D]  yes_A  yes_B  yes_C  yes_D   no_E   no_F   no_K
1   2     [D, E, F]   no_A   no_B   no_C  yes_D  yes_E  yes_F   no_K
2   3        [A, D]  yes_A   no_B   no_C  yes_D   no_E   no_F   no_K
3   4           [K]   no_A   no_B   no_C   no_D   no_E   no_F  yes_K

answered Feb 17, 2023 at 6:28

G.G

7654 silver badges5 bronze badges

Collectives™ on Stack Overflow

Split Pandas column of lists into multiple columns based on value

3 Answers 3

3 Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

3 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related