Split a dataframe column having a pandas series into multiple columns

Question

I have a pandas dataframe with many columns. One of them is a series. I want to split that column into a set of boolean columns. So, if the value in a row is ['Red','Hot','Summer'], I want 3 columns: Red (having value 1), Hot (having value 1) and Summer (having value 1).

Example:

df = pd.DataFrame({'Owner': ['Bob', 'Jane', 'Amy'],
               'Make': ['Ford', 'Ford', 'Jeep'],
               'Model': ['Bronco', 'Bronco', 'Wrangler'],
               'Sentiment': [['Meh','Red','Dirty'], ['Rusty','Sturdy'], ['Dirty','Red']],
               'Max Speed': [80, 150, 69],
              'Customer Rating': [90, 50, 91]})

gives us:

Now I want this output: (the True/False could be ones and zeros, too. Just as good).

note: I looked at this post below: Split a Pandas column of lists into multiple columns but that only directly works if your series isn't already part of a DF.

any help appreciated!

wow, that's pretty good! ...but this answer below: df = pd.concat([df, pd.get_dummies(df['Sentiment'].explode())], axis=1) multiplies the number of rows. I just want to explode the one column to multiple columns without changing the number of rows. — Alex P
– Alex P, Commented Nov 24, 2021 at 3:17

BENY · Accepted Answer · 2021-11-24 14:07:56Z

2

Try explode then crosstab and join

s = df.Sentiment.explode()
out = df.join(pd.crosstab(s.index,s).astype(bool))
out
  Owner  Make     Model          Sentiment  ...    Meh    Red  Rusty  Sturdy
0   Bob  Ford    Bronco  [Meh, Red, Dirty]  ...   True   True  False   False
1  Jane  Ford    Bronco    [Rusty, Sturdy]  ...  False  False   True    True
2   Amy  Jeep  Wrangler       [Dirty, Red]  ...  False   True  False   False
[3 rows x 11 columns]

edited Nov 24, 2021 at 14:07

answered Nov 24, 2021 at 3:25

BENY

324k22 gold badges176 silver badges250 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

score 0 · Accepted Answer · 2021-11-24 03:56:26Z

Try this:

df = pd.concat([df, pd.get_dummies(df['Sentiment'].explode())], axis=1)

Output:

>>> df
  Owner  Make     Model             Sentiment  Max Speed  Customer Rating  AWESOME  Dirty  LOVE  Meh  Red  Rusty  Sturdy
0   Bob  Ford    Bronco     [Meh, Red, Dirty]         80               90        0      0     0    1    0      0       0
0   Bob  Ford    Bronco     [Meh, Red, Dirty]         80               90        0      0     0    0    1      0       0
0   Bob  Ford    Bronco     [Meh, Red, Dirty]         80               90        0      1     0    0    0      0       0
1  Jane  Ford    Bronco       [Rusty, Sturdy]        150               50        0      0     0    0    0      1       0
1  Jane  Ford    Bronco       [Rusty, Sturdy]        150               50        0      0     0    0    0      0       1
2   Amy  Jeep  Wrangler  [LOVE, AWESOME, Red]         69               91        0      0     1    0    0      0       0
2   Amy  Jeep  Wrangler  [LOVE, AWESOME, Red]         69               91        1      0     0    0    0      0       0
2   Amy  Jeep  Wrangler  [LOVE, AWESOME, Red]         69               91        0      0     0    0    1      0       0

How it works

What you're looking for is usually called one-hot encoding, and there is a method in pandas just for that: get_dummies(). It takes a Series (or DataFrame) and creates a new column for each unique value in that Series (or DataFrame).

df['Sentiment'].explode() creates a new column, containing all the individual values of all the lists in the selected column(s).

Collectives™ on Stack Overflow

Split a dataframe column having a pandas series into multiple columns

2 Answers 2

Comments

How it works

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

How it works

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related