5

I have to sort a data frame on column 1 and 2; column 1 contains numbers and text, which should first be numerically sorted. In excel this is the standard way to sort, but not in pandas.. I couldn't find much info on how to do this in the pandas manual..

So this dataframe:

Z   762320  296 1
Z   861349  297 0
1   865545  20  20
1   865584  297 0
22  865625  297 0
2   865628  292 5
10  865662  297 0
1   865665  296 0
11  865694  293 1
1   865700  297 0
10  866429  297 0
11  866438  297 0

should be:

1   865545  20  20
1   865584  297 0
1   865665  296 0
1   865700  297 0
2   865628  292 5
10  865662  297 0
10  866429  297 0
11  865694  293 1
11  866438  297 0
22  865625  297 0
Z   762320  296 1
Z   861349  297 0

when i do df.sort([0,1]) i get:

     0       1    2   3
1    1  865545   20  20
2    1  865584  297   0
3    1  865665  296   0
4    1  865700  297   0
6   10  865662  297   0
7   10  866429  297   0
8   11  865694  293   1
9   11  866438  297   0
5    2  865628  292   5
10  22  865625  297   0
0    Z  762320  296   1
11   Z  861349  297   0

1 Answer 1

6

Do you mean column 0 and 1?

>>> df.sort([0, 1])
     0       1    2   3
2    1  865545   20  20
3    1  865584  297   0
7    1  865665  296   0
9    1  865700  297   0
5    2  865628  292   5
6   10  865662  297   0
10  10  866429  297   0
8   11  865694  293   1
11  11  866438  297   0
4   22  865625  297   0 
0    Z  762320  296   1
1    Z  861349  297   0

[update]

This happens if your data is not numeric (all elements are strings).

>>> df.values
array([['Z', '762320', '296', '1'],
       ['Z', '861349', '297', '0'],
       ['1', '865545', '20', '20'],
       ['1', '865584', '297', '0'],
       ['22', '865625', '297', '0'],
       ['2', '865628', '292', '5'],
       ['10', '865662', '297', '0'],
       ['1', '865665', '296', '0'],
       ['11', '865694', '293', '1'],
       ['1', '865700', '297', '0'],
       ['10', '866429', '297', '0'],
       ['11', '866438', '297', '0']], dtype=object)

String ordering is the expected result:

>>> df.sort([0, 1])    
     0       1    2   3
2    1  865545   20  20
3    1  865584  297   0
7    1  865665  296   0
9    1  865700  297   0
6   10  865662  297   0
10  10  866429  297   0
8   11  865694  293   1
11  11  866438  297   0
5    2  865628  292   5
4   22  865625  297   0
0    Z  762320  296   1
1    Z  861349  297   0

Try to convert the values first:

>>> def convert(v):
...:    try:
...:        return int(v)    
...:    except ValueError:
...:        return v

>>> pandas.DataFrame([convert(c) for c in l] for l in df.values)\
      .sort([0, 1])

     0       1    2   3
2    1  865545   20  20
3    1  865584  297   0
7    1  865665  296   0
9    1  865700  297   0
5    2  865628  292   5
6   10  865662  297   0
10  10  866429  297   0
8   11  865694  293   1
11  11  866438  297   0
4   22  865625  297   0
0    Z  762320  296   1
1    Z  861349  297   0

What is the difference? The elements are numeric now:

>>> pandas.DataFrame([convert(c) for c in l] for l in df.values)\
      .sort([0, 1]).values

array([[1.0, 865545.0, 20.0, 20.0],
      [1.0, 865584.0, 297.0, 0.0],
      [1.0, 865665.0, 296.0, 0.0],
      [1.0, 865700.0, 297.0, 0.0],
      [2.0, 865628.0, 292.0, 5.0],
      [10.0, 865662.0, 297.0, 0.0],
      [10.0, 866429.0, 297.0, 0.0],
      [11.0, 865694.0, 293.0, 1.0],
      [11.0, 866438.0, 297.0, 0.0],
      [22.0, 865625.0, 297.0, 0.0],
      ['Z', 762320.0, 296.0, 1.0],
      ['Z', 861349.0, 297.0, 0.0]], dtype=object)
Sign up to request clarification or add additional context in comments.

1 Comment

i edited my post to say that i do not get that result; column 0 is sorted based on strings.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.