2

I have a Pandas dataframe that looks like this:

+---+--------+-------------+------------------+
|   | ItemID | Description | Feedback         |
+---+--------+-------------+------------------+
| 0 | 8988   | Tall Chair  | I hated it       |
+---+--------+-------------+------------------+
| 1 | 8988   | Tall Chair  | Best chair ever  |
+---+--------+-------------+------------------+
| 2 | 6547   | Big Pillow  | Soft and amazing |
+---+--------+-------------+------------------+
| 3 | 6547   | Big Pillow  | Horrific color   |
+---+--------+-------------+------------------+

And I want to concatenate the values from the "Feedback" column into a new column, separated by commas, where the ItemID matches. Like so:

+---+--------+-------------+----------------------------------+
|   | ItemID | Description | NewColumn                        |
+---+--------+-------------+----------------------------------+
| 0 | 8988   | Tall Chair  | I hated it, Best chair ever      |
+---+--------+-------------+----------------------------------+
| 1 | 6547   | Big Pillow  | Soft and amazing, Horrific color |
+---+--------+-------------+----------------------------------+

I've tried several variations of pivot, merge, stacking, etc. and am stuck.
I think the NewColumn would end up being an array but I'm fairly new to Python so I'm not certain.
Also, ultimately, I'm going to try and use this for text classification (for a new "Description" generate some "Feedback" labels [multiclass problem])

2 Answers 2

1

Call .groupby('ItemID') on your dataframe, and then concatenate the feedback column:

df.groupby('ItemID')['Feedback'].apply(lambda x: ', '.join(x))

See Pandas groupby: How to get a union of strings.

Sign up to request clarification or add additional context in comments.

Comments

1

I think you can groupby by columns ItemID and Description, apply join and last reset_index:

print df.groupby(['ItemID', 'Description'])['Feedback'].apply(', '.join).reset_index(name='NewColumn')
   ItemID Description                         NewColumn
0    6547  Big Pillow  Soft and amazing, Horrific color
1    8988  Tall Chair       I hated it, Best chair ever

If you dont need Description column:

print df.groupby(['ItemID'])['Feedback'].apply(', '.join).reset_index(name='NewColumn')
   ItemID                         NewColumn
0    6547  Soft and amazing, Horrific color
1    8988       I hated it, Best chair ever

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.