How to sort a pandas dataframe by a custom order on a string index

Question

I have the following dataframe:

import pandas as pd

df = pd.DataFrame({'id': [2967, 5335, 13950, 6141, 6169],
                   'Player': ['Cedric Hunter', 'Maurice Baker',
                              'Ratko Varda', 'Ryan Bowen', 'Adrian Caldwell'],
                   'Year': [1991, 2004, 2001, 2009, 1997],
                   'Age': [27, 25, 22, 34, 31],
                   'Tm': ['CHH', 'VAN', 'TOT', 'OKC', 'DAL'],
                   'G': [6, 7, 60, 52, 81]})


df.set_index('Player', inplace=True)

It shows:

Out[128]:

                 Age   G   Tm  Year     id
Player
Cedric Hunter     27   6  CHH  1991   2967
Maurice Baker     25   7  VAN  2004   5335
Ratko Varda       22  60  TOT  2001  13950
Ryan Bowen        34  52  OKC  2009   6141
Adrian Caldwell   31  81  DAL  1997   6169

How can I sort by the index ('Player') using some arbitrary order? For example, as in the following.

reorderlist = ['Maurice Baker',
               'Adrian Caldwell',
               'Ratko Varda',
               'Ryan Bowen',
               'Cedric Hunter']

So you want it to be in order like it ordered in list reorderlist? — Tenfrow
– Tenfrow, Commented Apr 25, 2018 at 0:49
The 'right' way for pandas to implement this is to allow Categoricals as indices as R does; currently pandas doesn't, it gives error. — smci
– smci, Commented Apr 25, 2018 at 1:07

BENY · Accepted Answer · 2022-02-18 19:28:12Z

59

Just reindex

df.reindex(reorderlist)
Out[89]: 
                 Age   G   Tm  Year     id
Player                                    
Maurice Baker     25   7  VAN  2004   5335
Adrian Caldwell   31  81  DAL  1997   6169
Ratko Varda       22  60  TOT  2001  13950
Ryan Bowen        34  52  OKC  2009   6141
Cedric Hunter     27   6  CHH  1991   2967

Update info you have multiple players with same name

out = df.iloc[pd.Categorical(df.index,reorderlist).argsort()]

edited Feb 18, 2022 at 19:28

answered Apr 25, 2018 at 1:04

BENY

324k22 gold badges176 silver badges250 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

tiru Over a year ago

Hello , i have tried this players am getting as recorderlist but all other values ara NAN. I want exactly as above with values

Diego F Medina Over a year ago

This does not work when there are players having the same name.

BENY Over a year ago

@DiegoFMedina check the update

totalhack · Accepted Answer · 2020-09-15 12:47:40Z

11

As of Pandas 1.1 DataFrame.sort_values has a key param that takes a callable to control sorting. So you could use an approach like the following:

def sorter(column):
    reorder = [
        "Maurice Baker",
        "Adrian Caldwell",
        "Ratko Varda",
        "Ryan Bowen",
        "Cedric Hunter",
    ]
    # This also works:
    # mapper = {name: order for order, name in enumerate(reorder)}
    # return column.map(mapper)
    cat = pd.Categorical(column, categories=reorder, ordered=True)
    return pd.Series(cat)

df_sorted = df.sort_values(by="Player", key=sorter)

There may be some practical differences between using pd.Categorical and the column.map alternative I put in the comments. For example, see these caveats. I'm showing both for completeness. I also haven't tested how this compares performance-wise to the current accepted solution that uses df.reindex. The best approach might be different when you have a MultiIndex in play too.

answered Sep 15, 2020 at 12:47

totalhack

2,6781 gold badge24 silver badges28 bronze badges

1 Comment

madprogramer Over a year ago

Not all heroes wear capes! Some wear scarves, apparently!

smci · Accepted Answer · 2018-12-29 05:11:26Z

4

To get a custom sort-order on your list of strings, declare it as a categorical and manually specify that order in a sort:

player_order = pd.Categorical([ 'Maurice Baker', 'Adrian Caldwell','Ratko Varda' ,'Ryan Bowen' ,'Cedric Hunter'],
              ordered=True)

This is since pandas does not yet allow Categoricals as indices: df.set_index(keys=player_order, inplace=True) TypeError: unhashable type: 'Categorical'

So you'll want to do a manual custom sort using df.sort_index(level=player_order)

edited Dec 29, 2018 at 5:11

answered Apr 25, 2018 at 0:49

smci

34.2k21 gold badges118 silver badges152 bronze badges

2 Comments

jean-loup Over a year ago

Please give a solution, not '...' dots

smci Over a year ago

@jean-loup: I already gave the solution here, but to be 200% clear: df.sort_index(level=player_order)

PTQuoc · Accepted Answer · 2022-05-27 08:51:36Z

1

If there are more than one columns that need to be sort, in my experience, I use map to convert string value to number. Then use sort_values:

# Step 1/3: create dictionary to convert any string to number
convert_dict = {'Maurice Baker':1,
                'Adrian Caldwell':2,
                'Ratko Varda':3} # You can start filling till the end

# Step 2/3: Create column `new` that mapping from `Player`:
df['new'] = df['Player'].map(convert_dict)

# Step 3/3: sort
df.sort_values(by=['new'], ignore_index=True, inplace=True)
df.drop(columns=['new'], inplace=True)

answered May 27, 2022 at 8:51

PTQuoc

1,0935 silver badges15 bronze badges

Comments

T. Hall · Accepted Answer · 2022-04-03 09:04:45Z

To sort in arbirtary order while not including blank rows I found df.filter to work while testing out BENYS answer . It sorts as desired, ignores missing keys like df.reindex, but helpfully does not include empty rows for keys that have no data.

df.filter(reorderlist, axis=0)

                    id  Year  Age   Tm   G
Player                                    
Maurice Baker     5335  2004   25  VAN   7
Adrian Caldwell   6169  1997   31  DAL  81
Ratko Varda      13950  2001   22  TOT  60
Ryan Bowen        6141  2009   34  OKC  52
Cedric Hunter     2967  1991   27  CHH   6

#Extra keys dont add empty rows, missing keys ignored
reorderlist.append('LeBron James')
reorderlist.remove('Adrian Caldwell')
df.filter(reorderlist, axis=0)

                  id  Year  Age   Tm   G
Player                                  
Maurice Baker   5335  2004   25  VAN   7
Ratko Varda    13950  2001   22  TOT  60
Ryan Bowen      6141  2009   34  OKC  52
Cedric Hunter   2967  1991   27  CHH   6

Keyub W · Accepted Answer · 2023-11-29 19:15:37Z

0

Simplest way I've found is to just pass the list to .loc, but this won't work if the index is not unique.

df = df.loc[reorderlist, :]

answered Nov 29, 2023 at 19:15

Keyub W

1319 bronze badges

Collectives™ on Stack Overflow

How to sort a pandas dataframe by a custom order on a string index

6 Answers 6

3 Comments

1 Comment

2 Comments

Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

6 Answers 6

3 Comments

1 Comment

2 Comments

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related