2

I have a Pandas Dataframe that has a list column. I'd like to split this list column into multiple column based on the value. Returning yes_value or no_value for each record based on the column name.

Example input:

id | values
---|----------
1  | [A,B,C,D]
2  | [D,E,F]
3  | [A,D]
4  | [K]

Expected output:

id | values   |  A    |   B   |   C   |   D   |   E   |   F   |    K  |
---|----------|-------|-------|-------|-------|-------|-------|-------|
1  | [A,B,C,D]| yes_A | yes_B | yes_C | yes_D |  no_E |  no_F |  no_K |
2  | [D,E,F]  | no_A  | no_B  | no_C  | yes_D | yes_E | yes_F |  no_K |
3  | [A,D]    | yes_A | no_B  | no_C  | yes_D |  no_E |  no_F |  no_K | 
4  | [K]      | no_A  | no_B  | no_C  |  no_D |  no_E |  no_F | yes_K | 

3 Answers 3

2

You can use a crosstab to reshape:

df2 = df.explode('values')
df3 = pd.crosstab(df2['id'], df2['values']).replace({0: 'no_', 1: 'yes_'})

out = df.merge(df3.add(df3.columns), left_on='id', right_index=True)

Or str.get_dummies:

df2 = df['values'].agg('|'.join).str.get_dummies().replace({0: 'no_', 1: 'yes_'})
out = df.join(df2.add(df2.columns))

output:

   id        values      A      B      C      D      E      F      K
0   1  [A, B, C, D]  yes_A  yes_B  yes_C  yes_D   no_E   no_F   no_K
1   2     [D, E, F]   no_A   no_B   no_C  yes_D  yes_E  yes_F   no_K
2   3        [A, D]  yes_A   no_B   no_C  yes_D   no_E   no_F   no_K
3   4           [K]   no_A   no_B   no_C   no_D   no_E   no_F  yes_K
Sign up to request clarification or add additional context in comments.

3 Comments

in the get_dummies way. why you need the agg? df['values'].str.get_dummies(',').replace({0: 'no_', 1: 'yes_'}) is not working?
@yona because the input is a list and str.get_dummies needs a string. Give it a try, this would convert the whole list as string messing the logic ;)
you right. i didn't read that is a list column and not a string
1

Code:

#Input
df = pd.DataFrame({'id':[1,2,3,4], 'Values':[['A','B','C','D'], ['D','E','F'], ['A','D'],  ['K']]})

#STEP 1 merging the Values column all lists into one and findout the unique values using set
#so the output will be {'A', 'B', 'C', 'D', 'E', 'F', 'K'} and then looping on it as below
for i in sorted(set(sum(df['Values'].tolist(),[]))):  

    #STEP 2 Creating new column and check if column in list or not
    df[i] = df['Values'].apply(lambda x: f'yes_{i}' if i in x else f'no_{i}')
df

Output:

    id  Values          A       B       C       D       E       F    K
0   1   [A, B, C, D]    yes_A   yes_B   yes_C   yes_D   no_E    no_F    no_K
1   2   [D, E, F]       no_A    no_B    no_C    yes_D   yes_E   yes_F   no_K
2   3   [A, D]          yes_A   no_B    no_C    yes_D   no_E    no_F    no_K
3   4   [K]             no_A    no_B    no_C    no_D    no_E    no_F    yes_K

Comments

0
def function1(ss:pd.Series):
    ss1=np.where(ss.eq(1),"yes_"+ss.index,"no_"+ss.index)
    return pd.Series(ss1,index=ss.index)

df1=df.Values.map("-".join).str.get_dummies(sep="-").apply(function1,axis=1)
df.join(df1)

output:

   id        values      A      B      C      D      E      F      K
0   1  [A, B, C, D]  yes_A  yes_B  yes_C  yes_D   no_E   no_F   no_K
1   2     [D, E, F]   no_A   no_B   no_C  yes_D  yes_E  yes_F   no_K
2   3        [A, D]  yes_A   no_B   no_C  yes_D   no_E   no_F   no_K
3   4           [K]   no_A   no_B   no_C   no_D   no_E   no_F  yes_K

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.