pandas pivot to transform DataFrame

Question

I'm Just trying to play will pivot to transform the data Frame in a way i desired which pivot does but here its not working.

Any expert insight please.

DataFrame:

>>> df1
     id   item value
0  2225  prize   1.5
1  2225   unit    kg
2  2225  prize   2.4
3  8187   unit    lt
4  1401  stock    10
5  1401  prize   4.3

While running pivot i see the below error..

>>> df1.pivot('id', 'item')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/grid/common/pkgs/python/v3.6.1/lib/python3.6/site-packages/pandas/core/frame.py", line 4359, in pivot
    return pivot(self, index=index, columns=columns, values=values)
  File "/grid/common/pkgs/python/v3.6.1/lib/python3.6/site-packages/pandas/core/reshape/reshape.py", line 381, in pivot
    return indexed.unstack(columns)
  File "/grid/common/pkgs/python/v3.6.1/lib/python3.6/site-packages/pandas/core/frame.py", line 4546, in unstack
    return unstack(self, level, fill_value)
  File "/grid/common/pkgs/python/v3.6.1/lib/python3.6/site-packages/pandas/core/reshape/reshape.py", line 469, in unstack
    return _unstack_frame(obj, level, fill_value=fill_value)
  File "/grid/common/pkgs/python/v3.6.1/lib/python3.6/site-packages/pandas/core/reshape/reshape.py", line 488, in _unstack_frame
    fill_value=fill_value)
  File "/grid/common/pkgs/python/v3.6.1/lib/python3.6/site-packages/pandas/core/reshape/reshape.py", line 116, in __init__
    self._make_selectors()
  File "/grid/common/pkgs/python/v3.6.1/lib/python3.6/site-packages/pandas/core/reshape/reshape.py", line 154, in _make_selectors
    raise ValueError('Index contains duplicate entries, '
ValueError: Index contains duplicate entries, cannot reshape

Even pivot_table producing errors..

>>> df1.pivot_table(columns='item', values='value')

I tried to github link reference but didn't get it.

Desired should be like:

      Value

item  prize  stock  unit
    id

2225  1.5  10  2.4  lt

id 2225 has 2 prize , the expected output ignores 1 , why is that? aso what @PV8 said. — anky
– anky, Commented Aug 28, 2019 at 8:19
There is question - how aggregate numeric, how aggregate non numeric? — jezrael
– jezrael, Commented Aug 28, 2019 at 8:26
I meant the expected output doesnot seem correct, the prize if summed up should be 4.30 , you are just taking the first prize value for 1 id 2225 for example — anky
– anky, Commented Aug 28, 2019 at 9:11

Deven · Accepted Answer · 2019-08-28 08:26:55Z

3

you should try the following:

pd.pivot_table(df1, values='value', index=['id'], columns=['item'], aggfunc=np.sum)

And as mentioned below by jezrael, aggfunc can be mean for numeric values and for strings can be join.

edited Aug 28, 2019 at 8:26

answered Aug 28, 2019 at 8:12

Deven

7411 gold badge7 silver badges22 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Deven Over a year ago

Let me know if you still face issues.

jezrael Over a year ago

ya, working, but sum of strings is join with no space - like aa, bb to aabb, so be careful

Sebastien · Accepted Answer · 2019-08-28 08:21:48Z

2

According to the pivot doc

Raises
------
ValueError:
    When there are any `index`, `columns` combinations with multiple
    values. `DataFrame.pivot_table` when you need to aggregate.

In your case, id=2225 has 2 prize entries, which is not handled by pivot. You could aggregate first and then pivot:

df1.groupby(['id', 'item']).sum().reset_index().pivot('id', 'item', 'value')

+------+-------+-------+------+
| item | prize | stock | unit |
+------+-------+-------+------+
| id   |       |       |      |
| 1401 | 4.3   | 10    | NaN  |
| 2225 | 3.9   | NaN   | kg   |
| 8187 | NaN   | NaN   | lt   |
+------+-------+-------+------+

answered Aug 28, 2019 at 8:21

Sebastien

1537 bronze badges

3 Comments

Karn Kumar Over a year ago

Thanks @Sebastien for the answer and details, appreciate your help.

Sebastien Over a year ago

You're welcome @pygo, though I think @ DevendraSwami 's solution with pivot_table and aggfunc might be more elegant than mine.

Karn Kumar Over a year ago

Okay, @Sebastien .. thanks for the thought , i'll accept that.

jezrael · Accepted Answer · 2019-08-28 08:30:30Z

2

Problem of data is there are duplicates and also mixed numeric with strings in column value.

General solution - if need for numeric values mean and for duplicates strings join:

def f(x):
    y = pd.to_numeric(x, errors='coerce')
    if y.isna().all():
        return ', '.join(x)
    else:
        return  y.mean()

df = df1.pivot_table(index='id',columns='item', values='value', aggfunc=f)
print (df)
item prize stock unit
id                   
1401   4.3    10  NaN
2225  1.95   NaN   kg
8187   NaN   NaN   lt

Another idea is aggregate numeric and non numeric separate and then concat together:

df1['value1'] = pd.to_numeric(df1['value'], errors='coerce')

df2 = df1.pivot_table(index='id',columns='item', values='value1', aggfunc='mean')

df3 = df1[df1['value1'].isna()]
            .pivot_table(index='id',columns='item', values='value', aggfunc=','.join)

df = pd.concat([df2, df3], axis=1)
print (df)

item  prize  stock unit
id                     
1401   4.30   10.0  NaN
2225   1.95    NaN   kg
8187    NaN    NaN   lt

edited Aug 28, 2019 at 8:30

answered Aug 28, 2019 at 8:20

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

1 Comment

Karn Kumar Over a year ago

Thanks for the adding a detailed answer @Jezrael, appreciate it.

Grégoire Roussel · Accepted Answer · 2019-08-28 08:23:05Z

Pandas is complaining about the fact that you have the entry (2225, prize) twice at indices 0 and 2. This is a database problem rather than a pandas misbehaviour.

Fixing this duplicate entry removes the error:

# the original database
>>> df 
     id   item value                                                                                                                                                                                                                          
0  2225  prize   1.5                                                                                                                                                                                                                          
1  2225   unit    kg                                                                                                                                                                                                                          
2  2225  prize   2.4                                                                                                                                                                                                                          
3  8187   unit    lt                                                                                                                                                                                                                          
4  1401  stock    10                                                                                                                                                                                                                          
5  1401  prize   4.3  

# removing the duplicate error by changing index 2
>>> df.loc[2, 'id'] = 8187
>>> df
     id   item value
0  2225  prize   1.5
1  2225   unit    kg
2  8187  prize   2.4
3  8187   unit    lt
4  1401  stock    10
5  1401  prize   4.3

# pivot now works properly
>>> df.pivot('id', 'item')
     value
item prize stock unit
id
1401   4.3    10  NaN
2225   1.5   NaN   kg
8187   2.4   NaN   lt

Collectives™ on Stack Overflow

pandas pivot to transform DataFrame

DataFrame:

4 Answers 4

2 Comments

3 Comments

1 Comment

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

DataFrame:

4 Answers 4

2 Comments

3 Comments

1 Comment

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related