3

I have Pandas dataframe with two columns. One is unique identifier and second is the name of product attached to this unique identifier. I have duplicate values for identifier and product names. I want to convert one column of product names into several columns without duplicating identifier. Maybe I need to aggregate product names through identifier.

My dataframe looks like:

ID  Product_Name
100  Apple
100  Banana
200  Cherries
200  Apricots
200  Apple
300  Avocados

I want to have dataframe like this:

ID 
100  Apple Banana
200  Cherries Apricots Apple
300  Avocados

Each product along each identifier has to be in separate column

I tried pd.melt, pd.pivot, pd.pivot_table but only errors and this errors says No numeric types to aggregate

Any idea how to do this?

2 Answers 2

4

Use cumcount for new columns names to MultiIndex by set_index and reshape by unstack:

df = df.set_index(['ID',df.groupby('ID').cumcount()])['Product_Name'].unstack()

Or create Series of lists and new DataFrame by contructor:

s = df.groupby('ID')['Product_Name'].apply(list)
df = pd.DataFrame(s.values.tolist(), index=s.index)

print (df)
            0         1      2
ID                            
100     Apple    Banana    NaN
200  Cherries  Apricots  Apple
300  Avocados       NaN    NaN

But if want 2 column DataFrame:

df1 = df.groupby('ID')['Product_Name'].apply(' '.join).reset_index(name='new')
print (df1)
    ID                      new
0  100             Apple Banana
1  200  Cherries Apricots Apple
2  300                 Avocados
Sign up to request clarification or add additional context in comments.

Comments

1

use pivot funtion pivoting it can do the required thing!!

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.