1

I have a pandas dataframe with many columns. One of them is a series. I want to split that column into a set of boolean columns. So, if the value in a row is ['Red','Hot','Summer'], I want 3 columns: Red (having value 1), Hot (having value 1) and Summer (having value 1).

Example:

df = pd.DataFrame({'Owner': ['Bob', 'Jane', 'Amy'],
               'Make': ['Ford', 'Ford', 'Jeep'],
               'Model': ['Bronco', 'Bronco', 'Wrangler'],
               'Sentiment': [['Meh','Red','Dirty'], ['Rusty','Sturdy'], ['Dirty','Red']],
               'Max Speed': [80, 150, 69],
              'Customer Rating': [90, 50, 91]})

gives us:

enter image description here

Now I want this output: enter image description here (the True/False could be ones and zeros, too. Just as good).

note: I looked at this post below: Split a Pandas column of lists into multiple columns but that only directly works if your series isn't already part of a DF.

any help appreciated!

1
  • wow, that's pretty good! ...but this answer below: df = pd.concat([df, pd.get_dummies(df['Sentiment'].explode())], axis=1) multiplies the number of rows. I just want to explode the one column to multiple columns without changing the number of rows. Commented Nov 24, 2021 at 3:17

2 Answers 2

2

Try explode then crosstab and join

s = df.Sentiment.explode()
out = df.join(pd.crosstab(s.index,s).astype(bool))
out
  Owner  Make     Model          Sentiment  ...    Meh    Red  Rusty  Sturdy
0   Bob  Ford    Bronco  [Meh, Red, Dirty]  ...   True   True  False   False
1  Jane  Ford    Bronco    [Rusty, Sturdy]  ...  False  False   True    True
2   Amy  Jeep  Wrangler       [Dirty, Red]  ...  False   True  False   False
[3 rows x 11 columns]
Sign up to request clarification or add additional context in comments.

Comments

0

Try this:

df = pd.concat([df, pd.get_dummies(df['Sentiment'].explode())], axis=1)

Output:

>>> df
  Owner  Make     Model             Sentiment  Max Speed  Customer Rating  AWESOME  Dirty  LOVE  Meh  Red  Rusty  Sturdy
0   Bob  Ford    Bronco     [Meh, Red, Dirty]         80               90        0      0     0    1    0      0       0
0   Bob  Ford    Bronco     [Meh, Red, Dirty]         80               90        0      0     0    0    1      0       0
0   Bob  Ford    Bronco     [Meh, Red, Dirty]         80               90        0      1     0    0    0      0       0
1  Jane  Ford    Bronco       [Rusty, Sturdy]        150               50        0      0     0    0    0      1       0
1  Jane  Ford    Bronco       [Rusty, Sturdy]        150               50        0      0     0    0    0      0       1
2   Amy  Jeep  Wrangler  [LOVE, AWESOME, Red]         69               91        0      0     1    0    0      0       0
2   Amy  Jeep  Wrangler  [LOVE, AWESOME, Red]         69               91        1      0     0    0    0      0       0
2   Amy  Jeep  Wrangler  [LOVE, AWESOME, Red]         69               91        0      0     0    0    1      0       0

How it works

What you're looking for is usually called one-hot encoding, and there is a method in pandas just for that: get_dummies(). It takes a Series (or DataFrame) and creates a new column for each unique value in that Series (or DataFrame).

df['Sentiment'].explode() creates a new column, containing all the individual values of all the lists in the selected column(s).

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.