Get indices of intersecting rows of Numpy 2d Array

Question

I want to get the indices of the intersecting rows of a main numpy 2d array A, with another one B.

A=array([[1, 2],
         [3, 4],
         [5, 6],
         [7, 8],
         [9, 10]])

B=array([[1, 4],
         [1, 2],
         [5, 6],
         [6, 3]])

result=[0,2]

Where this should return [0,2] based on the indices of array A.

How can this be done efficiently for 2d arrays?

Thank you!

edit

I have tried the function:

k[np.in1d(k.view(dtype='i,i').reshape(k.shape[0]),k2.view(dtype='i,i').
reshape(k2.shape[0]))]

from Implementation of numpy in1d for 2D arrays? but I get a reshape error. My datatype is floats (with two decimals). Moreover, I also tried with sets but the performance is quite slow.

Yes I tried k[np.in1d(k.view(dtype='i,i').reshape(k.shape[0]),k2.view(dtype='i,i').reshape(k2.shape[0]))] from stackoverflow.com/questions/16210738/numpy-in1d-for-2d-arrays. But I get a reshape error. — Yannis Assael
– Yannis Assael, Commented May 22, 2014 at 18:39
Ah ok, can you edit that in to the question so everyone can see it clearly? — Tim
– Tim, Commented May 22, 2014 at 18:40
Why don't you just iterate through array A, keeping track of your index, and then check A[i] in B? You could even convert B to a set (the sub lists would need to become tuples) so that the membership check is constant time. — Nacho
– Nacho, Commented May 22, 2014 at 18:48

Jaime · Accepted Answer · 2014-05-22 20:12:50Z

5

With minimal changes, you can get your approach to work:

In [15]: A
Out[15]: 
array([[ 1,  2],
       [ 3,  4],
       [ 5,  6],
       [ 7,  8],
       [ 9, 10]])

In [16]: B
Out[16]: 
array([[1, 4],
       [1, 2],
       [5, 6],
       [6, 3]])

In [17]: np.in1d(A.view('i,i').reshape(-1), B.view('i,i').reshape(-1))
Out[17]: array([ True, False,  True, False, False], dtype=bool)

In [18]: np.nonzero(np.in1d(A.view('i,i').reshape(-1), B.view('i,i').reshape(-1)))
Out[18]: (array([0, 2], dtype=int64),)

In [19]: np.nonzero(np.in1d(A.view('i,i').reshape(-1), B.view('i,i').reshape(-1)))[0]
Out[19]: array([0, 2], dtype=int64)

If your arrays are not floats, and are both contiguous, then the following will be faster:

In [21]: dt = np.dtype((np.void, A.dtype.itemsize * A.shape[1]))

In [22]: np.nonzero(np.in1d(A.view(dt).reshape(-1), B.view(dt).reshape(-1)))[0]
Out[22]: array([0, 2], dtype=int64)

And a quick timing:

In [24]: %timeit np.nonzero(np.in1d(A.view('i,i').reshape(-1), B.view('i,i').reshape(-1)))[0]
10000 loops, best of 3: 75 µs per loop

In [25]: %timeit np.nonzero(np.in1d(A.view(dt).reshape(-1), B.view(dt).reshape(-1)))[0]
10000 loops, best of 3: 29.8 µs per loop

answered May 22, 2014 at 20:12

Jaime

67.7k19 gold badges128 silver badges164 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Anna Over a year ago

can you please explain lines 21 and 22? It seems as if you are coercing to some other datatype and setting A as the same format. However, when I try with my own 2D array- call it C- that is shape(25257, 4) and dtype('<f8'), I get an error with C.view(dt)

Saullo G. P. Castro · Accepted Answer · 2014-05-22 19:01:18Z

2

You can use np.char.array() objects to do this comparison using np.in1d():

s1 = np.char.array(A[:,0]) + '-' + np.char.array(A[:,1])
s2 = np.char.array(B[:,0]) + '-' + np.char.array(B[:,1])

np.where(np.in1d(s1, s2))[0]
#array([0, 2], dtype=int64)

NOTE: A and B must be of the same data type (int, float, etc) for this to work.

answered May 22, 2014 at 19:01

Saullo G. P. Castro

59.4k28 gold badges191 silver badges244 bronze badges

Collectives™ on Stack Overflow

Get indices of intersecting rows of Numpy 2d Array

2 Answers 2

1 Comment

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related