39

I have the following dataframe:

import pandas as pd

df = pd.DataFrame({'id': [2967, 5335, 13950, 6141, 6169],
                   'Player': ['Cedric Hunter', 'Maurice Baker',
                              'Ratko Varda', 'Ryan Bowen', 'Adrian Caldwell'],
                   'Year': [1991, 2004, 2001, 2009, 1997],
                   'Age': [27, 25, 22, 34, 31],
                   'Tm': ['CHH', 'VAN', 'TOT', 'OKC', 'DAL'],
                   'G': [6, 7, 60, 52, 81]})


df.set_index('Player', inplace=True)

It shows:

Out[128]:

                 Age   G   Tm  Year     id
Player
Cedric Hunter     27   6  CHH  1991   2967
Maurice Baker     25   7  VAN  2004   5335
Ratko Varda       22  60  TOT  2001  13950
Ryan Bowen        34  52  OKC  2009   6141
Adrian Caldwell   31  81  DAL  1997   6169

How can I sort by the index ('Player') using some arbitrary order? For example, as in the following.

reorderlist = ['Maurice Baker',
               'Adrian Caldwell',
               'Ratko Varda',
               'Ryan Bowen',
               'Cedric Hunter']
2
  • So you want it to be in order like it ordered in list reorderlist? Commented Apr 25, 2018 at 0:49
  • The 'right' way for pandas to implement this is to allow Categoricals as indices as R does; currently pandas doesn't, it gives error. Commented Apr 25, 2018 at 1:07

6 Answers 6

59

Just reindex

df.reindex(reorderlist)
Out[89]: 
                 Age   G   Tm  Year     id
Player                                    
Maurice Baker     25   7  VAN  2004   5335
Adrian Caldwell   31  81  DAL  1997   6169
Ratko Varda       22  60  TOT  2001  13950
Ryan Bowen        34  52  OKC  2009   6141
Cedric Hunter     27   6  CHH  1991   2967

Update info you have multiple players with same name

out = df.iloc[pd.Categorical(df.index,reorderlist).argsort()]
Sign up to request clarification or add additional context in comments.

3 Comments

Hello , i have tried this players am getting as recorderlist but all other values ara NAN. I want exactly as above with values
This does not work when there are players having the same name.
@DiegoFMedina check the update
11

As of Pandas 1.1 DataFrame.sort_values has a key param that takes a callable to control sorting. So you could use an approach like the following:

def sorter(column):
    reorder = [
        "Maurice Baker",
        "Adrian Caldwell",
        "Ratko Varda",
        "Ryan Bowen",
        "Cedric Hunter",
    ]
    # This also works:
    # mapper = {name: order for order, name in enumerate(reorder)}
    # return column.map(mapper)
    cat = pd.Categorical(column, categories=reorder, ordered=True)
    return pd.Series(cat)

df_sorted = df.sort_values(by="Player", key=sorter)

There may be some practical differences between using pd.Categorical and the column.map alternative I put in the comments. For example, see these caveats. I'm showing both for completeness. I also haven't tested how this compares performance-wise to the current accepted solution that uses df.reindex. The best approach might be different when you have a MultiIndex in play too.

1 Comment

Not all heroes wear capes! Some wear scarves, apparently!
4

To get a custom sort-order on your list of strings, declare it as a categorical and manually specify that order in a sort:

player_order = pd.Categorical([ 'Maurice Baker', 'Adrian Caldwell','Ratko Varda' ,'Ryan Bowen' ,'Cedric Hunter'],
              ordered=True)

This is since pandas does not yet allow Categoricals as indices: df.set_index(keys=player_order, inplace=True) TypeError: unhashable type: 'Categorical'

So you'll want to do a manual custom sort using df.sort_index(level=player_order)

2 Comments

Please give a solution, not '...' dots
@jean-loup: I already gave the solution here, but to be 200% clear: df.sort_index(level=player_order)
1

If there are more than one columns that need to be sort, in my experience, I use map to convert string value to number. Then use sort_values:

# Step 1/3: create dictionary to convert any string to number
convert_dict = {'Maurice Baker':1,
                'Adrian Caldwell':2,
                'Ratko Varda':3} # You can start filling till the end

# Step 2/3: Create column `new` that mapping from `Player`:
df['new'] = df['Player'].map(convert_dict)

# Step 3/3: sort
df.sort_values(by=['new'], ignore_index=True, inplace=True)
df.drop(columns=['new'], inplace=True)

Comments

0

To sort in arbirtary order while not including blank rows I found df.filter to work while testing out BENYS answer . It sorts as desired, ignores missing keys like df.reindex, but helpfully does not include empty rows for keys that have no data.

df.filter(reorderlist, axis=0)

                    id  Year  Age   Tm   G
Player                                    
Maurice Baker     5335  2004   25  VAN   7
Adrian Caldwell   6169  1997   31  DAL  81
Ratko Varda      13950  2001   22  TOT  60
Ryan Bowen        6141  2009   34  OKC  52
Cedric Hunter     2967  1991   27  CHH   6

#Extra keys dont add empty rows, missing keys ignored
reorderlist.append('LeBron James')
reorderlist.remove('Adrian Caldwell')
df.filter(reorderlist, axis=0)

                  id  Year  Age   Tm   G
Player                                  
Maurice Baker   5335  2004   25  VAN   7
Ratko Varda    13950  2001   22  TOT  60
Ryan Bowen      6141  2009   34  OKC  52
Cedric Hunter   2967  1991   27  CHH   6

Comments

0

Simplest way I've found is to just pass the list to .loc, but this won't work if the index is not unique.

df = df.loc[reorderlist, :]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.