14

I have a large 2d array with hundreds of columns. I would like to sort it lexicographically, i.e. by first column, then by second column, and so on until the last column. I imagine this should be easy to do but I haven't been able to find a quick way to do this.

1 Answer 1

26

This is what numpy.lexsort is for, but the interface is awkward. Pass it a 2D array, and it will argsort the columns, sorting by the last row first, then the second-to-last row, continuing up to the first row:

>>> x
array([[0, 0, 0, 2, 3],
       [2, 3, 2, 3, 2],
       [3, 1, 3, 0, 0],
       [3, 1, 1, 3, 1]])
>>> numpy.lexsort(x)
array([4, 1, 2, 3, 0], dtype=int64)

If you want to sort by rows, with the first column as the primary key, you need to rotate the array before lexsorting it:

>>> x[numpy.lexsort(numpy.rot90(x))]
array([[0, 0, 0, 2, 3],
       [2, 3, 2, 3, 2],
       [3, 1, 1, 3, 1],
       [3, 1, 3, 0, 0]])
Sign up to request clarification or add additional context in comments.

3 Comments

Great this seems to work! So then I need to do searchsorted in this but not sure how to. So given a 1d array I want to find out if if it's one of the 2d array's sorted rows. Any suggestions would be appreciated.
@grigor: maybe [all(row == t) for row in x]
One could add that there's a more time-efficient way of getting the same result as with rot90, by using x[numpy.lexsort(x.T[::-1])]. According to timeit, this is about 25% faster than x[numpy.lexsort(numpy.rot90(x))] (tested for x.shape == (1000,5)).

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.