Sort numpy array in-place using order from a second array

Question

Let two ndarrays: A of shape (n, *m), and B of shape (n, ). Is there a way to sort A in-place using the order that would sort B?

Sorting A with B is easy using np.argsort, but this is not done in-place:

A = A[np.argsort(B)]

Comments:

A and B have different dtypes, and A can have more than two dimensions. Hence they can’t be stacked to use ndarray.sort().
A takes up a lot of space, which is why it needs to be sorted in-place. Any solution requiring twice the space occupied by A would therefore defeat this purpose.
The title of this question “Re-arranging numpy array in place” may sound related, but the question itself is not very clear, and the answers do not match my question.

Looks like this is just an advanced indexing task, A[arr,:], indexing the rows of A. That necessarily is a copy. Even if you use A[:] = ... to write the new values back 'on-to' A, there will be a temporary buffer with the data. — hpaulj
– hpaulj, Commented Oct 21, 2018 at 17:02
@hpaulj: indeed, I was wondering if there is a way to do this without indexing and hence without creating such a buffer. — Arcturus B
– Arcturus B, Commented Oct 21, 2018 at 17:13

Paul Panzer · Accepted Answer · 2018-10-21 18:20:20Z

1

Here is a solution that works by following cycles in the index array. It can optionally be compiled using pythran giving a significant speedup if rows are small (80x for 10 elements) and a small speedup if rows are large (30% for 1000 elements).

To keep it pythran compatible I had to simplify it a bit, so it only accepts 2D arrays and it only sorts along axis 0.

Code:

import numpy as np

#pythran export take_inplace(float[:, :] or int[:, :], int[:])

def take_inplace(a, idx):
    n, m = a.shape
    been_there = np.zeros(n, bool)
    keep = np.empty(m, a.dtype)
    for i in range(n):
        if been_there[i]:
            continue
        keep[:] = a[i]
        been_there[i] = True
        j = i
        k = idx[i]
        while not been_there[k]:
            a[j] = a[k]
            been_there[k] = True
            j = k
            k = idx[k]
        a[j] = keep

Sample run using compiled version. As indicated above compilation is only required for small rows, for larger rows pure python should be fast enough.

>>> from timeit import timeit
>>> import numpy as np
>>> import take_inplace
>>> 
>>> a = np.random.random((1000, 10))
>>> idx = a[:, 4].argsort()
>>>
>>> take_inplace.take_inplace(a, idx)
>>>
# correct
>>> np.all(np.arange(1000) == a[:, 4].argsort())
True
>>>
# speed
>>> timeit(lambda: take_inplace.take_inplace(a, idx), number=1000)
0.011950935004279017
>>>
# for comparison
>>> timeit(lambda: a[idx], number=1000)
0.02985276997787878

edited Oct 21, 2018 at 18:20

answered Oct 21, 2018 at 18:02

Paul Panzer

53.3k3 gold badges59 silver badges103 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Arcturus B Over a year ago

This works perfectly, thanks a lot! I replaced the 1st line of take_inplace by n, *m = a.shape so that a could have 2 or more dimensions.

dobkind · Accepted Answer · 2018-10-21 15:32:45Z

1

If you can set A beforehand as a structured array whose datatype is composed of a subarray of shape (m, ) and a scalar of the same type (e.g., np.int32), then you can sort it in-place with respect to B. For example:

import numpy as np

B = np.array([3, 1, 2])
A = np.array([[10, 11], [20, 21], [30, 31]])

(n, m) = A.shape

dt = np.dtype([('a', np.int32, (m, )), ('b', int)])
A2 = np.array([(a, b) for a, b in zip(A, B)], dtype=dt)

A2.sort(order='b')
print A2

edited Oct 21, 2018 at 15:32

answered Oct 21, 2018 at 15:27

dobkind

4264 silver badges11 bronze badges

Collectives™ on Stack Overflow

Sort numpy array in-place using order from a second array

2 Answers 2

1 Comment

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related