180

Take the following data-frame:

x = np.tile(np.arange(3),3)
y = np.repeat(np.arange(3),3)
df = pd.DataFrame({"x": x, "y": y})
   x  y
0  0  0
1  1  0
2  2  0
3  0  1
4  1  1
5  2  1
6  0  2
7  1  2
8  2  2

I need to sort it by x first, and only second by y:

df2 = df.sort(["x", "y"])
   x  y
0  0  0
3  0  1
6  0  2
1  1  0
4  1  1
7  1  2
2  2  0
5  2  1
8  2  2

How can I change the index such that it is ascending again. I.e. how do I get this:

   x  y
0  0  0
1  0  1
2  0  2
3  1  0
4  1  1
5  1  2
6  2  0
7  2  1
8  2  2

I have tried the following. Unfortunately, it doesn't change the index at all:

df2.reindex(np.arange(len(df2.index)))
1
  • 13
    If you don't need a new df, try df.sort(["x", "y"], ignore_index=True, inplace=True) Commented Aug 14, 2020 at 22:07

5 Answers 5

285

You can reset the index using reset_index to get back a default index of 0, 1, 2, ..., n-1 (and use drop=True to indicate you want to drop the existing index instead of adding it as an additional column to your dataframe):

In [19]: df2 = df2.reset_index(drop=True)

In [20]: df2
Out[20]:
   x  y
0  0  0
1  0  1
2  0  2
3  1  0
4  1  1
5  1  2
6  2  0
7  2  1
8  2  2
Sign up to request clarification or add additional context in comments.

1 Comment

That was super helpful. exp_data=exp_data.reindex(['year'],axis='columns') kept the old index. Drop removes the old index.
85

Since pandas 1.0.0 df.sort_values has a new parameter ignore_index which does exactly what you need:

In [1]: df2 = df.sort_values(by=['x','y'],ignore_index=True)

In [2]: df2
Out[2]:
   x  y
0  0  0
1  0  1
2  0  2
3  1  0
4  1  1
5  1  2
6  2  0
7  2  1
8  2  2

1 Comment

I think this is new in version 1.0.0.
14

df.sort() is deprecated, use df.sort_values(...): https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.sort_values.html

Then follow joris' answer by doing df.reset_index(drop=True)

Comments

7

The following works!

  1. If you want to change the existing dataframe itself, you may directly use

     df.sort_values(by=['col1'], inplace=True)
     df.reset_index(drop=True, inplace=True)
    
     df
     >>     col1  col2  col3 col4
         0    A     2     0    a
         1    A     1     1    B
         2    B     9     9    c
         5    C     4     3    F
         4    D     7     2    e
         3  NaN     8     4    D
    
  2. Else, if you don't want to change the existing dataframe but want to store the sorted dataframe into another variable separately, you may use:

    df_sorted = df.sort_values(by=['col1']).reset_index(drop=True)
    
    df_sorted
    >>     col1  col2  col3 col4
        0    A     2     0    a
        1    A     1     1    B
        2    B     9     9    c
        3    C     4     3    F
        4    D     7     2    e
        5  NaN     8     4    D
    
    df
    >>       col1  col2  col3 col4
          0    A     2     0    a
          1    A     1     1    B
          2    B     9     9    c
          3  NaN     8     4    D
          4    D     7     2    e
          5    C     4     3    F
    

Comments

5

You can set new indices by using set_index:

df2.set_index(np.arange(len(df2.index)))

Output:

   x  y
0  0  0
1  0  1
2  0  2
3  1  0
4  1  1
5  1  2
6  2  0
7  2  1
8  2  2

2 Comments

This is unnecessary, use reset_index() instead
Unnecessary, but maybe the most convenient way if you want your index starting at 1 instead of 0 (just add 1 to the argument of set_index)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.