7

I want to get the indices of the intersecting rows of a main numpy 2d array A, with another one B.

A=array([[1, 2],
         [3, 4],
         [5, 6],
         [7, 8],
         [9, 10]])

B=array([[1, 4],
         [1, 2],
         [5, 6],
         [6, 3]])

result=[0,2]

Where this should return [0,2] based on the indices of array A.

How can this be done efficiently for 2d arrays?

Thank you!

edit

I have tried the function:

k[np.in1d(k.view(dtype='i,i').reshape(k.shape[0]),k2.view(dtype='i,i').
reshape(k2.shape[0]))]

from Implementation of numpy in1d for 2D arrays? but I get a reshape error. My datatype is floats (with two decimals). Moreover, I also tried with sets but the performance is quite slow.

7
  • 1
    Is there anything you tried yourself that didn't work? Commented May 22, 2014 at 18:37
  • 1
    Yes I tried k[np.in1d(k.view(dtype='i,i').reshape(k.shape[0]),k2.view(dtype='i,i').reshape(k2.shape[0]))] from stackoverflow.com/questions/16210738/numpy-in1d-for-2d-arrays. But I get a reshape error. Commented May 22, 2014 at 18:39
  • 1
    Ah ok, can you edit that in to the question so everyone can see it clearly? Commented May 22, 2014 at 18:40
  • 1
    Why don't you just iterate through array A, keeping track of your index, and then check A[i] in B? You could even convert B to a set (the sub lists would need to become tuples) so that the membership check is constant time. Commented May 22, 2014 at 18:48
  • 1
    I thought this is kind of inefficient. Commented May 22, 2014 at 18:53

2 Answers 2

5

With minimal changes, you can get your approach to work:

In [15]: A
Out[15]: 
array([[ 1,  2],
       [ 3,  4],
       [ 5,  6],
       [ 7,  8],
       [ 9, 10]])

In [16]: B
Out[16]: 
array([[1, 4],
       [1, 2],
       [5, 6],
       [6, 3]])

In [17]: np.in1d(A.view('i,i').reshape(-1), B.view('i,i').reshape(-1))
Out[17]: array([ True, False,  True, False, False], dtype=bool)

In [18]: np.nonzero(np.in1d(A.view('i,i').reshape(-1), B.view('i,i').reshape(-1)))
Out[18]: (array([0, 2], dtype=int64),)

In [19]: np.nonzero(np.in1d(A.view('i,i').reshape(-1), B.view('i,i').reshape(-1)))[0]
Out[19]: array([0, 2], dtype=int64)

If your arrays are not floats, and are both contiguous, then the following will be faster:

In [21]: dt = np.dtype((np.void, A.dtype.itemsize * A.shape[1]))

In [22]: np.nonzero(np.in1d(A.view(dt).reshape(-1), B.view(dt).reshape(-1)))[0]
Out[22]: array([0, 2], dtype=int64)

And a quick timing:

In [24]: %timeit np.nonzero(np.in1d(A.view('i,i').reshape(-1), B.view('i,i').reshape(-1)))[0]
10000 loops, best of 3: 75 µs per loop

In [25]: %timeit np.nonzero(np.in1d(A.view(dt).reshape(-1), B.view(dt).reshape(-1)))[0]
10000 loops, best of 3: 29.8 µs per loop
Sign up to request clarification or add additional context in comments.

1 Comment

can you please explain lines 21 and 22? It seems as if you are coercing to some other datatype and setting A as the same format. However, when I try with my own 2D array- call it C- that is shape(25257, 4) and dtype('<f8'), I get an error with C.view(dt)
2

You can use np.char.array() objects to do this comparison using np.in1d():

s1 = np.char.array(A[:,0]) + '-' + np.char.array(A[:,1])
s2 = np.char.array(B[:,0]) + '-' + np.char.array(B[:,1])

np.where(np.in1d(s1, s2))[0]
#array([0, 2], dtype=int64)

NOTE: A and B must be of the same data type (int, float, etc) for this to work.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.