6

I have a 1d array of ids, for example:

a = [1, 3, 4, 7, 9]

Then another 2d array:

b = [[1, 4, 7, 9], [3, 7, 9, 1]]

I would like to have a third array with the same shape of b where each item is the index of the corresponding item from a, that is:

c = [[0, 2, 3, 4], [1, 3, 4, 0]]

What's a vectorized way to do that using numpy?

1
  • 3
    np.searchsorted(a, b), IIUC. a needs to be sorted for this appraoch. lu = np.empty(max(a)+1, a.dtype); lu[a] = np.arange(len(a)); lu[np.array(b)] should be faster for larger arrays, but needs additional space for the look-up array. Commented May 31, 2022 at 18:31

3 Answers 3

2

this may not make sense but ... you can use np.interp to do that ...

a = [1, 3, 4, 7, 9]
sorting = np.argsort(a)
positions = np.arange(0,len(a))
xp = np.array(a)[sorting]
fp = positions[sorting]
b = [[1, 4, 7, 9], [3, 7, 9, 1]]
c = np.rint(np.interp(b,xp,fp)) # rint is better than astype(int) because floats are tricky.
# but astype(int) should work faster for small len(a) but not recommended.

this should work as long as the len(a) is smaller than the largest representable int by float (16,777,217) .... and this algorithm is of O(n*log(n)) speed, (or rather len(b)*log(len(a)) to be precise)

Sign up to request clarification or add additional context in comments.

Comments

1

Effectively, this solution is a one-liner. The only catch is that you need to reshape the array before you do the one-liner, and then reshape it back again:

import numpy as np

a = np.array([1, 3, 4, 7, 9])
b = np.array([[1, 4, 7, 9], [3, 7, 9, 1]])
original_shape = b.shape

c = np.where(b.reshape(b.size, 1) == a)[1]

c = c.reshape(original_shape)

This results with:

[[0 2 3 4]
 [1 3 4 0]]

Comments

0

Broadcasting to the rescue!

>>> ((np.arange(1, len(a) + 1)[:, None, None]) * (a[:, None, None] == b)).sum(axis=0) - 1
array([[0, 2, 3, 4],
       [1, 3, 4, 0]])

3 Comments

While this solution is simple it is very inefficient for big arrays. It runs in quadratic O(len(a) * len(b)) time and a quadratic amount of memory. Thus, computing a 100_000-sized array will simply crash on most machines. The solution of MichaelSzczesny runs in (quasi-)linear time and use much less memory.
Ha, oh well. At least it was fun to develop :)
Not only is it memory inefficient, as @JérômeRichard said, I believe these math operations will be expensive for large arrays as well (even though your code is very elegant). The same kind of problem arises when one uses Kronecker products (with np.kron) to increase the dimension of a matrix instead of np.repmat for instance.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.