3

I have a n x 2 matrix of integers. The first column is a series 0,1,-1,2,-2, however these are in the order that they were compiled in from their constituent matrices. The second column is a list of indices from another list.

I would like to sort the matrix via this second column. This would be equivalent to selecting two columns of data in Excel, and sorting via Column B (where the data is in columns A and B). Keep in mind, the adjacent data in the first column of each row should be kept with its respective second column counterpart. I have looked at solutions using the following:

data[np.argsort(data[:, 0])]

But this does not seem to work. The matrix in question looks like this:

matrix([[1, 1],
        [1, 3],
        [1, 7],
        ..., 
        [2, 1021],
        [2, 1040],
        [2, 1052]])

2 Answers 2

3

You could use np.lexsort:

numpy.lexsort(keys, axis=-1)

Perform an indirect sort using a sequence of keys.

Given multiple sorting keys, which can be interpreted as columns in a spreadsheet, lexsort returns an array of integer indices that describes the sort order by multiple columns.


In [13]: data = np.matrix(np.arange(10)[::-1].reshape(-1,2))

In [14]: data
Out[14]: 
matrix([[9, 8],
        [7, 6],
        [5, 4],
        [3, 2],
        [1, 0]])

In [15]: temp = data.view(np.ndarray)

In [16]: np.lexsort((temp[:, 1], ))
Out[16]: array([4, 3, 2, 1, 0])

In [17]: temp[np.lexsort((temp[:, 1], ))]
Out[17]: 
array([[1, 0],
       [3, 2],
       [5, 4],
       [7, 6],
       [9, 8]])

Note if you pass more than one key to np.lexsort, the last key is the primary key. The next to last key is the second key, and so on.


Using np.lexsort as I show above requires the use of a temporary array because np.lexsort does not work on numpy matrices. Since temp = data.view(np.ndarray) creates a view, rather than a copy of data, it does not require much extra memory. However,

temp[np.lexsort((temp[:, 1], ))]

is a new array, which does require more memory.

There is also a way to sort by columns in-place. The idea is to view the array as a structured array with two columns. Unlike plain ndarrays, structured arrays have a sort method which allows you to specify columns as keys:

In [65]: data.dtype
Out[65]: dtype('int32')

In [66]: temp2 = data.ravel().view('int32, int32')

In [67]: temp2.sort(order = ['f1', 'f0'])

Notice that since temp2 is a view of data, it does not require allocating new memory and copying the array. Also, sorting temp2 modifies data at the same time:

In [69]: data
Out[69]: 
matrix([[1, 0],
        [3, 2],
        [5, 4],
        [7, 6],
        [9, 8]])
Sign up to request clarification or add additional context in comments.

1 Comment

Thanks heaps @unutbu your a star! I look the second solution the best this one is exactly what I wanted :) Thanks again
1

You had the right idea, just off by a few characters:

>>> import numpy as np
>>> data = np.matrix([[9, 8],
...                   [7, 6],
...                   [5, 4],
...                   [3, 2],
...                   [1, 0]])
>>> data[np.argsort(data.A[:, 1])]
matrix([[1, 0],
        [3, 2],
        [5, 4],
        [7, 6],
        [9, 8]])

1 Comment

Thanks for the clarification with this method @Bi Rico ;)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.