Cross-reference between numpy arrays

Question

I have a 1d array of ids, for example:

a = [1, 3, 4, 7, 9]

Then another 2d array:

b = [[1, 4, 7, 9], [3, 7, 9, 1]]

I would like to have a third array with the same shape of b where each item is the index of the corresponding item from a, that is:

c = [[0, 2, 3, 4], [1, 3, 4, 0]]

What's a vectorized way to do that using numpy?

np.searchsorted(a, b), IIUC. a needs to be sorted for this appraoch. lu = np.empty(max(a)+1, a.dtype); lu[a] = np.arange(len(a)); lu[np.array(b)] should be faster for larger arrays, but needs additional space for the look-up array. — Michael Szczesny
– Michael Szczesny, Commented May 31, 2022 at 18:31

Ahmed AEK · Accepted Answer · 2022-05-31 18:54:47Z

2

this may not make sense but ... you can use np.interp to do that ...

a = [1, 3, 4, 7, 9]
sorting = np.argsort(a)
positions = np.arange(0,len(a))
xp = np.array(a)[sorting]
fp = positions[sorting]
b = [[1, 4, 7, 9], [3, 7, 9, 1]]
c = np.rint(np.interp(b,xp,fp)) # rint is better than astype(int) because floats are tricky.
# but astype(int) should work faster for small len(a) but not recommended.

this should work as long as the len(a) is smaller than the largest representable int by float (16,777,217) .... and this algorithm is of O(n*log(n)) speed, (or rather len(b)*log(len(a)) to be precise)

edited May 31, 2022 at 18:54

answered May 31, 2022 at 18:37

Ahmed AEK

23.2k3 gold badges19 silver badges50 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

waykiki · Accepted Answer · 2022-05-31 19:15:31Z

1

Effectively, this solution is a one-liner. The only catch is that you need to reshape the array before you do the one-liner, and then reshape it back again:

import numpy as np

a = np.array([1, 3, 4, 7, 9])
b = np.array([[1, 4, 7, 9], [3, 7, 9, 1]])
original_shape = b.shape

c = np.where(b.reshape(b.size, 1) == a)[1]

c = c.reshape(original_shape)

This results with:

[[0 2 3 4]
 [1 3 4 0]]

answered May 31, 2022 at 19:15

waykiki

1,1342 gold badges14 silver badges22 bronze badges

Comments

user17242583 · Accepted Answer · 2022-05-31 18:35:37Z

0

Broadcasting to the rescue!

>>> ((np.arange(1, len(a) + 1)[:, None, None]) * (a[:, None, None] == b)).sum(axis=0) - 1
array([[0, 2, 3, 4],
       [1, 3, 4, 0]])

answered May 31, 2022 at 18:35

user17242583

3 Comments

Jérôme Richard Over a year ago

While this solution is simple it is very inefficient for big arrays. It runs in quadratic O(len(a) * len(b)) time and a quadratic amount of memory. Thus, computing a 100_000-sized array will simply crash on most machines. The solution of MichaelSzczesny runs in (quasi-)linear time and use much less memory.

user17242583 Over a year ago

Ha, oh well. At least it was fun to develop :)

econbernardo Over a year ago

Not only is it memory inefficient, as @JérômeRichard said, I believe these math operations will be expensive for large arrays as well (even though your code is very elegant). The same kind of problem arises when one uses Kronecker products (with np.kron) to increase the dimension of a matrix instead of np.repmat for instance.

Collectives™ on Stack Overflow

Cross-reference between numpy arrays

3 Answers 3

Comments

Comments

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related