Pivoting multiple columns with pandas

Question

I've got a dataframe here

    Row dist1   dist2   dist3   variable    value   Smallest    Group   
0   0       40  101 dist2   40  Smallest    SmallestGroup   
1   0       40  101 dist3   101 SecondSmallest  SecondSmallestGroup 
2   1       30  100 dist2   30  Smallest    SmallestGroup   
3   1       30  100 dist3   100 SecondSmallest  SecondSmallestGroup 
4   2   30  20  98  dist2   20  Smallest    SmallestGroup   
5   2   30  20  98  dist1   30  SecondSmallest  SecondSmallestGroup 
6   3   20  15  72  dist2   15  Smallest    SmallestGroup   
7   3   20  15  72  dist1   20  SecondSmallest  SecondSmallestGroup 
8   4   15  16  11  dist3   11  Smallest    SmallestGroup   
9   4   15  16  11  dist1   15  SecondSmallest  SecondSmallestGroup

I'd like to pivot the data so that I have four new columns: Smallest and SecondSmallest (which contains the values from the value column) and SmallestGroup and SecondSmallestGroup (which contains the valus from the variable column).

My desired output is as follows:

Row dist1   dist2   dist3   Smallest    SecondSmallest  SmallestGroup   SecondSmallestGroup
0   NaN      40      101    40                101            dist2         dist3
1   NaN      30      100    30                100            dist2         dist3

I'm certain I can achieve this with a pivot, but am unsure about how to structure the command.

Scott Boston · Accepted Answer · 2018-02-27 19:51:31Z

Since, I don't think the raw data specifies that 'Group' and 'variable' must be asssociated and 'Smallest' and 'value' together, then we have to do this two steps and concatenate.

Let's try this:

(pd.concat([df.set_index(['Row','dist1','dist2','dist3','Group'])['variable'].unstack(),
            df.set_index(['Row','dist1','dist2','dist3','Smallest'])['value'].unstack()],
           axis=1)
   .reset_index())

Output:

   Row  dist1  dist2  dist3 SecondSmallestGroup SmallestGroup  SecondSmallest  Smallest
0    0    NaN     40    101               dist3         dist2             101        40
1    1    NaN     30    100               dist3         dist2             100        30
2    2   30.0     20     98               dist1         dist2              30        20
3    3   20.0     15     72               dist1         dist2              20        15
4    4   15.0     16     11               dist1         dist3              15        11

ilia timofeev · Accepted Answer · 2018-02-27 20:01:39Z

df[['Row','dist1','dist2','dist3']].groupby('Row').first().join(
    df.pivot_table(values = 'value',index= 'Row',columns = ['Smallest'],aggfunc='first')).join(
    df.pivot_table(values = 'variable',index= 'Row',columns = ['Group'],aggfunc='first'))

Output:

     dist1  dist2  dist3  SecondSmallest  Smallest SecondSmallestGroup  \
Row                                                                      
0      NaN     40    101             101        40               dist3   
1      NaN     30    100             100        30               dist3   
2     30.0     20     98              30        20               dist1   
3     20.0     15     72              20        15               dist1   
4     15.0     16     11              15        11               dist1   
SmallestGroup  
Row                
0           dist2  
1           dist2  
2           dist2  
3           dist2  
4           dist3

Collectives™ on Stack Overflow

Pivoting multiple columns with pandas

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related