0

I have a dataset like this,

sample = {'Theme': ['never give a ten','interaction speed','no feedback,premium'],
        'cat1': [0,0,0],
        'cat2': [0,0,0],
        'cat3': [0,0,0],
        'cat4': [0,0,0]
        }

pd.DataFrame(sample,columns = ['Theme','cat1','cat2','cat3','cat4'])


              Theme   cat1 cat2 cat3 cat4
0   never give a ten    0   0   0   0
1   interaction speed   0   0   0   0
2   no feedback,premium 0   0   0   0

Now, I need to replace the values in cat columns based on value in Theme. If the Theme column has 'never give a ten', then change cat1 as 1, similarly if the theme column has 'interaction speed', then change cat2 as 1, if the theme column has 'no feedback' in it, change 'cat3' as 1 and for 'premium' change cat4 as 1.

In this sample I have provided 4 categories, I have in total 21 categories. I can do if word in string 21 times for 21 categories, but I am looking for an efficient way to write this in a function, loop every row and go through the logic and update the corresponding columns, can anyone help please?

Thanks in advance.

1 Answer 1

1

Here is possible set columns names by categories with Series.str.get_dummies - columns names are sorted:

df1 = df['Theme'].str.get_dummies(',')
print (df1)
   interaction speed  never give a ten  no feedback  premium
0                  0                 1            0        0
1                  1                 0            0        0
2                  0                 0            1        1

If need first column in output add DataFrame.join:

df11 = df[['Theme']].join(df['Theme'].str.get_dummies(','))
print (df11)
                 Theme  interaction speed  never give a ten  no feedback  \
0     never give a ten                  0                 1            0   
1    interaction speed                  1                 0            0   
2  no feedback,premium                  0                 0            1   

   premium  
0        0  
1        0  
2        1  

If order of columns is important add DataFrame.reindex:

#removed posible duplicates with remain ordering
cols = dict.fromkeys([y for x in df['Theme'] for y in x.split(',')]).keys()
df2 = df['Theme'].str.get_dummies(',').reindex(cols, axis=1)
print (df2)
   never give a ten  interaction speed  no feedback  premium
0                 1                  0            0        0
1                 0                  1            0        0
2                 0                  0            1        1


cols = dict.fromkeys([y for x in df['Theme'] for y in x.split(',')]).keys()
df2 = df[['Theme']].join(df['Theme'].str.get_dummies(',').reindex(cols, axis=1))
print (df2)
                 Theme  never give a ten  interaction speed  no feedback  \
0     never give a ten                 1                  0            0   
1    interaction speed                 0                  1            0   
2  no feedback,premium                 0                  0            1   

   premium  
0        0  
1        0  
2        1  
Sign up to request clarification or add additional context in comments.

1 Comment

Wow. That's genius. Let me try that. Thanks

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.