Pandas pivot table sorting with multiple indexes

Question

Input code:

import pandas as pd
import numpy as np

#Dummy df:
df = pd.DataFrame({'Name': ['John', 'Boby', 'Mina', 'Peter',
'Nicky','Peter','Mina','Peter'],
           'City': ['London','NY','LA','London','NY','HK','NY','HK'],

'Stage': ['Masters', 'Graduate', 'Graduate', 'Masters',
'Graduate','Masters','Graduate','Graduate'],
'Year':[2020,2019,2020,2019,2020,2019,2020,2020],
'Month':[202001,201902,202003,201904,202005,201902,202007,202012],
'Earnings': [27, 23, 21, 66, 24,22,34,65]})

df_pivot=pd.pivot_table(df,values = 'Earnings', index=
['Name','City','Stage'], columns = ['Year','Month'], aggfunc=np.sum,
fill_value=0, margins = True).sort_values('All', ascending=False)
print(df_pivot)

Output pivot table:

Year                    2019          2020                              
All
Month                 201902 201904 202001 202003 202005 202007 202012     
Name  City   Stage                                                         
All                       45     66     27     21     24     34     65  282
Peter London Masters       0     66      0      0      0      0      0   66
      HK     Graduate      0      0      0      0      0      0     65   65
Mina  NY     Graduate      0      0      0      0      0     34      0   34
John  London Masters       0      0     27      0      0      0      0   27
Nicky NY     Graduate      0      0      0      0     24      0      0   24
Boby  NY     Graduate     23      0      0      0      0      0      0   23
Peter HK     Masters      22      0      0      0      0      0      0   22
Mina  LA     Graduate      0      0      0     21      0      0      0   21

Desired output sorted firstly by first column, then within the group by second column and lastly within the group by 3rd column:

Year                    2019          2020                              All
Month                 201902 201904 202001 202003 202005 202007 202012     
Name  City   Stage                                                         
All                       45     66     27     21     24     34     65  282
Peter HK     Graduate      0      0      0      0      0      0     65   65
             Masters      22      0      0      0      0      0      0   22
      London Masters       0     66      0      0      0      0      0   66
Mina  NY     Graduate      0      0      0      0      0     34      0   34
      LA     Graduate      0      0      0     21      0      0      0   21
John  London Masters       0      0     27      0      0      0      0   27
Nicky NY     Graduate      0      0      0      0     24      0      0   24
Boby  NY     Graduate     23      0      0      0      0      0      0   23

Please note how Peter-HK is higher than Peter-London, because sum of Peter-HK (65+22) > sum of Peter-London (66).

In other words: First give me Name with biggest total, then within that name give me City with Biggest total, then within that Name and that City give me Stage with biggest total.

Thank you pawel

Not certain what the final result should look like. Have you tried to sort again after sorting by 'All'. Like this: df_pivot.sort_values('All', ascending=False).sort_index() — piRSquared
– piRSquared, Commented Feb 24, 2021 at 15:17
Hello and thank you for quick respond! The end result - I have attached as screen shot from excel. In short words I want to sort first column by "All", then second column by "All" and third column by "All". it would mean that for end result "Peter" is on top as All is (60+23), then for Peter in column City I want to have first HK as its value is 60 and then London with value 23. Does it make sense? Can you look on attached screen shoot, as I am unable to paste text, no idea why.. Thank you! — Paweł Poprawski
– Paweł Poprawski, Commented Feb 24, 2021 at 15:25
chain a sort index? df_pivot.sort_values(by="All", ascending=False).sort_index() ? — anky
– anky, Commented Feb 24, 2021 at 15:27
I have updated post - in bottom I wrote the expected result. — Paweł Poprawski
– Paweł Poprawski, Commented Feb 24, 2021 at 15:30

Jan Alexander · Accepted Answer · 2021-02-26 18:53:13Z

2

Edit after understanding the question even better.

You want to sort on maximal score obtained by a person (defined by Name). Then within that person you want to sort on the individual scores obtained by that person.

In your example, I can get the list with the desired sequence of Name in this way:

import pandas as pd
import numpy as np

#Dummy df:
df = pd.DataFrame({'Name': ['John', 'Boby', 'Mina', 'Peter', 
'Nicky','Peter','Mina','Peter'],
               'City': ['London','NY','LA','London','NY','HK','NY','HK'],

  'Stage': ['Masters', 'Graduate', 'Graduate', 'Masters', 
  'Graduate','Masters','Graduate','Graduate'],
  'Year':[2020,2019,2020,2019,2020,2019,2020,2020],
  'Month':[202001,201902,202003,201904,202005,201902,202007,202012],
  'Earnings': [27, 23, 21, 23, 24,22,34,65]})

# Make the pivot table
df_pivot=pd.pivot_table(df,values = 'Earnings', index= 
  ['Name','City','Stage'], columns = ['Year','Month'], aggfunc=np.sum, 
  fill_value=0, margins = True).sort_values('All', ascending=False)
print('Original table')
print(df_pivot)

def sort_groups(df, group_by_col, sort_by_col, F_asc):
    """Sort a dataframe by a certain level of the MultiIndex

    Args:
        df (pd.DataFrame): Dataframe to sort
        group_by_col (str): name of the index level to sort by
        sort_by_col (str): name of the value column to sort by
        F_asc (bool): Ascending sort - True/False

    Returns:
        pd.Dataframe: Dataframe sorted on given multiindex level
    """

    # Make a list of the desired index sequence based on the max value found in each group
    ind = df.groupby(by=group_by_col).max().sort_values(sort_by_col, ascending=F_asc).index.to_list()

    # Return re-indexed dataframe
    return df.reindex(ind, level=df.index.names.index(group_by_col))

# First level sorting: Name
df_pivot_1 = sort_groups(df_pivot, 'Name', 'All', False)
print('\nSort groups at name level:')
print(df_pivot_1)

# Second level sorting : City
#df_pivot_2 = df_pivot_1.groupby(by='Name').apply(lambda x : sort_groups(x, 'City', 'All', False))
df_pivot_2 =pd.concat([sort_groups(group, 'City', 'All', False) for index, group in df_pivot_1.groupby(by=['Name'])])
print('\nSort groups at city level:')
print(df_pivot_2)

# Third level sorting : Stage
df_pivot_3 = df_pivot_2.groupby(by = ['Name', 'City']).apply(lambda x : sort_groups(x, 'Stage', 'All', False))
print('\nSort groups at stage level:')
print(df_pivot_3)

This solution does not place the All row where you indicate it though. Is this very stringent for you?

regards,

Jan

edited Feb 26, 2021 at 18:53

answered Feb 24, 2021 at 15:26

Jan Alexander

1658 bronze badges

Sign up to request clarification or add additional context in comments.

11 Comments

Paweł Poprawski Over a year ago

Hi Jan, I tried your code, but it made only my "All" disappear, no sorting was applied. Can you check bottom of my post where I posted what I am expecting in the end. Thank you

Jan Alexander Over a year ago

Thanks for that, now it is clear for me what you intended to obtain. I think the edit of my original post reflects this.

Paweł Poprawski Over a year ago

thank you for your response, I have edited my post as with different values your proposal does not work. I will be grateful for help!

Jan Alexander Over a year ago

In the solution I posted yesterday, you could just move the All column manually I guess.

Paweł Poprawski Over a year ago

Thank you for your another response, it works better, but why I receive duplicated columns in this case? When I run same code as you posted I have twice column "name" in my table, if I expand sorting further (as I have df with more indexed columns, then all of them are duplicated). Can you run your code and you will see column name is duplicated? Thank you

|

ListenSoftware Louise Ai Agent · Accepted Answer · 2021-03-25 16:46:13Z

0

here is an super clean way to combine a groupby with a pivot

  df = pd.DataFrame({'Name': ['John', 'Boby', 'Mina', 'Peter', 
  'Nicky','Peter','Mina','Peter'],
           'City': ['London','NY','LA','London','NY','HK','NY','HK'],

 'Stage': ['Masters', 'Graduate', 'Graduate', 'Masters', 
 'Graduate','Masters','Graduate','Graduate'],
 'Year':[2020,2019,2020,2019,2020,2019,2020,2020],
 'Month':[202001,201902,202003,201904,202005,201902,202007,202012],
 'Earnings': [27, 23, 21, 23, 24,22,34,65]})

grouped=df.groupby(['Name','City','Stage','Year','Month'])['Earnings'].sum()
#print(grouped)
grouped=grouped.reset_index(name='Sum')
fp=grouped.pivot(index=['Name','City','Stage'],columns=['Year','Month'],values='Sum').fillna(0)
fp['Totals'] = fp.sum(axis='columns')
fp["Rank"] = fp.groupby(['Name','City'])['Totals'].sum()

fp = fp.sort_values(by=['Name','Rank','City','Totals'],ascending=[False,False,False,False])

print(fp)

edited Mar 25, 2021 at 16:46

answered Feb 26, 2021 at 0:43

ListenSoftware Louise Ai Agent

4,3432 gold badges31 silver badges39 bronze badges

28 Comments

Jan Alexander Over a year ago

OP specifically requested the Name column and City column to stay grouped.

ListenSoftware Louise Ai Agent Over a year ago

if you sort by totals than that rule conflicts. How would you resolve the issue

Paweł Poprawski Over a year ago

@Golden Lion, I can see in your output that for Peter, City "London" is higher than HK, but sum for HK is higher than one for London, so it should be higher... So first I want to sort by name, then for each name I want to sort city and then for each city under specific name I want to sort stage,

ListenSoftware Louise Ai Agent Over a year ago

I added a sort Boolean list. you can change the sort order for each column. This should give you the results you want.

Paweł Poprawski Over a year ago

@Golden Lion, I tried your code but it still doesn't work. Sorting for whole "Peter" works, but if you look at "Mina" Mina - LA with value 21 is higher than MINA - NY with value 34... Will appreciate your help.

|

Collectives™ on Stack Overflow

Pandas pivot table sorting with multiple indexes

2 Answers 2

11 Comments

28 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

11 Comments

28 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related