2

I'm Just trying to play will pivot to transform the data Frame in a way i desired which pivot does but here its not working.

Any expert insight please.

DataFrame:

>>> df1
     id   item value
0  2225  prize   1.5
1  2225   unit    kg
2  2225  prize   2.4
3  8187   unit    lt
4  1401  stock    10
5  1401  prize   4.3

While running pivot i see the below error..

>>> df1.pivot('id', 'item')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/grid/common/pkgs/python/v3.6.1/lib/python3.6/site-packages/pandas/core/frame.py", line 4359, in pivot
    return pivot(self, index=index, columns=columns, values=values)
  File "/grid/common/pkgs/python/v3.6.1/lib/python3.6/site-packages/pandas/core/reshape/reshape.py", line 381, in pivot
    return indexed.unstack(columns)
  File "/grid/common/pkgs/python/v3.6.1/lib/python3.6/site-packages/pandas/core/frame.py", line 4546, in unstack
    return unstack(self, level, fill_value)
  File "/grid/common/pkgs/python/v3.6.1/lib/python3.6/site-packages/pandas/core/reshape/reshape.py", line 469, in unstack
    return _unstack_frame(obj, level, fill_value=fill_value)
  File "/grid/common/pkgs/python/v3.6.1/lib/python3.6/site-packages/pandas/core/reshape/reshape.py", line 488, in _unstack_frame
    fill_value=fill_value)
  File "/grid/common/pkgs/python/v3.6.1/lib/python3.6/site-packages/pandas/core/reshape/reshape.py", line 116, in __init__
    self._make_selectors()
  File "/grid/common/pkgs/python/v3.6.1/lib/python3.6/site-packages/pandas/core/reshape/reshape.py", line 154, in _make_selectors
    raise ValueError('Index contains duplicate entries, '
ValueError: Index contains duplicate entries, cannot reshape

Even pivot_table producing errors..

>>> df1.pivot_table(columns='item', values='value')

I tried to github link reference but didn't get it.

Desired should be like:

      Value

item  prize  stock  unit
    id

2225  1.5  10  2.4  lt
6
  • 1
    where is oyur stock value for 2225 coming from? Commented Aug 28, 2019 at 8:17
  • 1
    id 2225 has 2 prize , the expected output ignores 1 , why is that? aso what @PV8 said. Commented Aug 28, 2019 at 8:19
  • 1
    There is question - how aggregate numeric, how aggregate non numeric? Commented Aug 28, 2019 at 8:26
  • @anky_91, i didn't get the hold what you asked. Commented Aug 28, 2019 at 9:10
  • I meant the expected output doesnot seem correct, the prize if summed up should be 4.30 , you are just taking the first prize value for 1 id 2225 for example Commented Aug 28, 2019 at 9:11

4 Answers 4

3

you should try the following:

pd.pivot_table(df1, values='value', index=['id'], columns=['item'], aggfunc=np.sum)

And as mentioned below by jezrael, aggfunc can be mean for numeric values and for strings can be join.

Sign up to request clarification or add additional context in comments.

2 Comments

Let me know if you still face issues.
ya, working, but sum of strings is join with no space - like aa, bb to aabb, so be careful
2

According to the pivot doc

Raises
------
ValueError:
    When there are any `index`, `columns` combinations with multiple
    values. `DataFrame.pivot_table` when you need to aggregate.

In your case, id=2225 has 2 prize entries, which is not handled by pivot. You could aggregate first and then pivot:

df1.groupby(['id', 'item']).sum().reset_index().pivot('id', 'item', 'value')

+------+-------+-------+------+
| item | prize | stock | unit |
+------+-------+-------+------+
| id   |       |       |      |
| 1401 | 4.3   | 10    | NaN  |
| 2225 | 3.9   | NaN   | kg   |
| 8187 | NaN   | NaN   | lt   |
+------+-------+-------+------+

3 Comments

Thanks @Sebastien for the answer and details, appreciate your help.
You're welcome @pygo, though I think @ DevendraSwami 's solution with pivot_table and aggfunc might be more elegant than mine.
Okay, @Sebastien .. thanks for the thought , i'll accept that.
2

Problem of data is there are duplicates and also mixed numeric with strings in column value.

General solution - if need for numeric values mean and for duplicates strings join:

def f(x):
    y = pd.to_numeric(x, errors='coerce')
    if y.isna().all():
        return ', '.join(x)
    else:
        return  y.mean()

df = df1.pivot_table(index='id',columns='item', values='value', aggfunc=f)
print (df)
item prize stock unit
id                   
1401   4.3    10  NaN
2225  1.95   NaN   kg
8187   NaN   NaN   lt

Another idea is aggregate numeric and non numeric separate and then concat together:

df1['value1'] = pd.to_numeric(df1['value'], errors='coerce')

df2 = df1.pivot_table(index='id',columns='item', values='value1', aggfunc='mean')

df3 = df1[df1['value1'].isna()]
            .pivot_table(index='id',columns='item', values='value', aggfunc=','.join)

df = pd.concat([df2, df3], axis=1)
print (df)

item  prize  stock unit
id                     
1401   4.30   10.0  NaN
2225   1.95    NaN   kg
8187    NaN    NaN   lt

1 Comment

Thanks for the adding a detailed answer @Jezrael, appreciate it.
-1

Pandas is complaining about the fact that you have the entry (2225, prize) twice at indices 0 and 2. This is a database problem rather than a pandas misbehaviour.

Fixing this duplicate entry removes the error:

# the original database
>>> df 
     id   item value                                                                                                                                                                                                                          
0  2225  prize   1.5                                                                                                                                                                                                                          
1  2225   unit    kg                                                                                                                                                                                                                          
2  2225  prize   2.4                                                                                                                                                                                                                          
3  8187   unit    lt                                                                                                                                                                                                                          
4  1401  stock    10                                                                                                                                                                                                                          
5  1401  prize   4.3  

# removing the duplicate error by changing index 2
>>> df.loc[2, 'id'] = 8187
>>> df
     id   item value
0  2225  prize   1.5
1  2225   unit    kg
2  8187  prize   2.4
3  8187   unit    lt
4  1401  stock    10
5  1401  prize   4.3

# pivot now works properly
>>> df.pivot('id', 'item')
     value
item prize stock unit
id
1401   4.3    10  NaN
2225   1.5   NaN   kg
8187   2.4   NaN   lt

1 Comment

Thanks @Gregorie for attempting an answer.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.