3

Let two ndarrays: A of shape (n, *m), and B of shape (n, ). Is there a way to sort A in-place using the order that would sort B?

Sorting A with B is easy using np.argsort, but this is not done in-place:

A = A[np.argsort(B)]

Comments:

  • A and B have different dtypes, and A can have more than two dimensions. Hence they can’t be stacked to use ndarray.sort().
  • A takes up a lot of space, which is why it needs to be sorted in-place. Any solution requiring twice the space occupied by A would therefore defeat this purpose.
  • The title of this question “Re-arranging numpy array in place” may sound related, but the question itself is not very clear, and the answers do not match my question.
2
  • Looks like this is just an advanced indexing task, A[arr,:], indexing the rows of A. That necessarily is a copy. Even if you use A[:] = ... to write the new values back 'on-to' A, there will be a temporary buffer with the data. Commented Oct 21, 2018 at 17:02
  • @hpaulj: indeed, I was wondering if there is a way to do this without indexing and hence without creating such a buffer. Commented Oct 21, 2018 at 17:13

2 Answers 2

1

Here is a solution that works by following cycles in the index array. It can optionally be compiled using pythran giving a significant speedup if rows are small (80x for 10 elements) and a small speedup if rows are large (30% for 1000 elements).

To keep it pythran compatible I had to simplify it a bit, so it only accepts 2D arrays and it only sorts along axis 0.

Code:

import numpy as np

#pythran export take_inplace(float[:, :] or int[:, :], int[:])

def take_inplace(a, idx):
    n, m = a.shape
    been_there = np.zeros(n, bool)
    keep = np.empty(m, a.dtype)
    for i in range(n):
        if been_there[i]:
            continue
        keep[:] = a[i]
        been_there[i] = True
        j = i
        k = idx[i]
        while not been_there[k]:
            a[j] = a[k]
            been_there[k] = True
            j = k
            k = idx[k]
        a[j] = keep

Sample run using compiled version. As indicated above compilation is only required for small rows, for larger rows pure python should be fast enough.

>>> from timeit import timeit
>>> import numpy as np
>>> import take_inplace
>>> 
>>> a = np.random.random((1000, 10))
>>> idx = a[:, 4].argsort()
>>>
>>> take_inplace.take_inplace(a, idx)
>>>
# correct
>>> np.all(np.arange(1000) == a[:, 4].argsort())
True
>>>
# speed
>>> timeit(lambda: take_inplace.take_inplace(a, idx), number=1000)
0.011950935004279017
>>>
# for comparison
>>> timeit(lambda: a[idx], number=1000)
0.02985276997787878
Sign up to request clarification or add additional context in comments.

1 Comment

This works perfectly, thanks a lot! I replaced the 1st line of take_inplace by n, *m = a.shape so that a could have 2 or more dimensions.
1

If you can set A beforehand as a structured array whose datatype is composed of a subarray of shape (m, ) and a scalar of the same type (e.g., np.int32), then you can sort it in-place with respect to B. For example:

import numpy as np

B = np.array([3, 1, 2])
A = np.array([[10, 11], [20, 21], [30, 31]])

(n, m) = A.shape

dt = np.dtype([('a', np.int32, (m, )), ('b', int)])
A2 = np.array([(a, b) for a, b in zip(A, B)], dtype=dt)

A2.sort(order='b')
print A2

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.