1

Let's say I have a Pandas DataFrame with two columns, like:

df = pd.DataFrame({'a': [1, 2, 3, 4], 'b': [100, 200, 300, 400]})
print(df)

   a    b
0  1  100
1  2  200
2  3  300
3  4  400

And let's say I also have a Pandas Series, like:

s = pd.Series([1, 3, 2, 4])
print(s)

0    1
1    3
2    2
3    4
dtype: int64

How can I sort the a column to become the same order as the s series, with the corresponding row values sorted together?

My desired output would be:

   a    b
0  1  100
1  3  300
2  2  200
3  4  400

Is there any way to achieve this?

Please check self-answer below.

2
  • Is series s completely separate from dataframe df? Where does it come from? It makes things easier if you concat series s` into df. Then you can simply do df.sort_values('s') Commented Mar 6, 2020 at 14:31
  • I believe your suggestion would not be valid, since he is trying to maintain the row order from the first dataframe. If he concatenates the series s, the values of b will be connected to other values of a. Commented Jun 21, 2024 at 4:55

2 Answers 2

4

What about:

(
    df.assign(s=s)
    .sort_values(by='s')
    .drop('s', axis=1)
)
Sign up to request clarification or add additional context in comments.

Comments

3

I have ran into these issues quite often, so I just thought to share my solutions in Pandas.

Solutions:

Solution 1:

Using set_index to convert the a column to the index, then use reindex to change the order, then use rename_axis to change the index name back to a, then use reset_index to convert the a column from an index back to a column:

print(df.set_index('a').reindex(s).rename_axis('a').reset_index('a'))

Solution 2:

Using set_index to convert the a column to the index, then use loc to change the order, then use reset_index to convert the a column from an index back to a column:

print(df.set_index('a').loc[s].reset_index())

Solution 3:

Using iloc to index the rows in a different order, then use map to get that order that would fit the df to make it get sorted with the s series:

print(df.iloc[list(map(df['a'].tolist().index, s))])

Solution 4:

Using pd.DataFrame to create a new DataFrame object, then use sorted with a key argument to sort the DataFrame by the s series:

print(pd.DataFrame(sorted(df.values.tolist(), key=lambda x: s.tolist().index(x[0])), columns=df.columns))

Timings:

Timing with the below code:

import pandas as pd
from timeit import timeit
df = pd.DataFrame({'a': [1, 2, 3, 4], 'b': [100, 200, 300, 400]})
s = pd.Series([1, 3, 2, 4])
def u10_1():
    return df.set_index('a').reindex(s).rename_axis('a').reset_index('a')
def u10_2():
    return df.set_index('a').loc[s].reset_index()
def u10_3():
    return df.iloc[list(map(df['a'].tolist().index, s))]
def u10_4():
    return pd.DataFrame(sorted(df.values.tolist(), key=lambda x: s.tolist().index(x[0])), columns=df.columns)
print('u10_1:', timeit(u10_1, number=1000))
print('u10_2:', timeit(u10_2, number=1000))
print('u10_3:', timeit(u10_3, number=1000))
print('u10_4:', timeit(u10_4, number=1000))

Output:

u10_1: 3.012849470495621
u10_2: 3.072132612502147
u10_3: 0.7498072134665241
u10_4: 0.8109911930595484

@Allen has a pretty good answer too.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.