3

I have two very large numpy arrays, which are both 3D. I need to find an efficient way to check if they are overlapping, because turning them both into sets first takes too long. I tried to use another solution I found here for this same problem but for 2D arrays, but I didn't manage to make it work for 3D. Here is the solution for 2D:

nrows, ncols = A.shape
dtype={'names':['f{}'.format(i) for i in range(ndep)],
       'formats':ndep * [A.dtype]}
C = np.intersect1d(A.view(dtype).view(dtype), B.view(dtype).view(dtype))
# This last bit is optional if you're okay with "C" being a structured array...
C = C.view(A.dtype).reshape(-1, ndep)

(where A and B are the 2D arrays) I need to find the number of overlapping numpy arrays, but not the specific ones.

7
  • Not sure if that's what you've intended but you can check intersection for each dim and then intersect the result Commented Feb 20, 2019 at 17:22
  • No. In my scenario there's an object in the second and third dimensions. I want to check if those objects appear in the other array, and if so how many. Commented Feb 20, 2019 at 17:28
  • 2
    How would you define if two 3D arrays are intersecting? Can you add minimal sample data? Commented Feb 20, 2019 at 17:30
  • What do you mean by "intersecting"? Mathematically, this concept only applies to sets, not to matrices. Commented Feb 20, 2019 at 17:31
  • The objects are images. the images are in 2D, and they are in an array. I want to check if some of the images that appear in one array also appear in another. Sorry for the bad explanation earlier Commented Feb 20, 2019 at 17:33

1 Answer 1

8

We could leverage views using a helper function that I have used across few Q&As. To get the presence of subarrays, we could use np.isin on the views or use a more laborious one with np.searchsorted.

Approach #1 : Using np.isin -

# https://stackoverflow.com/a/45313353/ @Divakar
def view1D(a, b): # a, b are arrays
    a = np.ascontiguousarray(a)
    b = np.ascontiguousarray(b)
    void_dt = np.dtype((np.void, a.dtype.itemsize * a.shape[1]))
    return a.view(void_dt).ravel(),  b.view(void_dt).ravel()

def isin_nd(a,b):
    # a,b are the 3D input arrays to give us "isin-like" functionality across them
    A,B = view1D(a.reshape(a.shape[0],-1),b.reshape(b.shape[0],-1))
    return np.isin(A,B)

Approach #2 : We could also leverage np.searchsorted upon the views -

def isin_nd_searchsorted(a,b):
    # a,b are the 3D input arrays
    A,B = view1D(a.reshape(a.shape[0],-1),b.reshape(b.shape[0],-1))
    sidx = A.argsort()
    sorted_index = np.searchsorted(A,B,sorter=sidx)
    sorted_index[sorted_index==len(A)] = len(A)-1
    idx = sidx[sorted_index]
    return A[idx] == B

So, these two solutions give us the mask of presence of each of the subarrays from a in b. Hence, to get our desired count, it would be - isin_nd(a,b).sum() or isin_nd_searchsorted(a,b).sum().

Sample run -

In [71]: # Setup with 3 common "subarrays"
    ...: np.random.seed(0)
    ...: a = np.random.randint(0,9,(10,4,5))
    ...: b = np.random.randint(0,9,(7,4,5))
    ...: 
    ...: b[1] = a[4]
    ...: b[3] = a[2]
    ...: b[6] = a[0]

In [72]: isin_nd(a,b).sum()
Out[72]: 3

In [73]: isin_nd_searchsorted(a,b).sum()
Out[73]: 3

Timings on large arrays -

In [74]: # Setup
    ...: np.random.seed(0)
    ...: a = np.random.randint(0,9,(100,100,100))
    ...: b = np.random.randint(0,9,(100,100,100))
    ...: idxa = np.random.choice(range(len(a)), len(a)//2, replace=False)
    ...: idxb = np.random.choice(range(len(b)), len(b)//2, replace=False)
    ...: a[idxa] = b[idxb]

# Verify output
In [82]: np.allclose(isin_nd(a,b),isin_nd_searchsorted(a,b))
Out[82]: True

In [75]: %timeit isin_nd(a,b).sum()
10 loops, best of 3: 31.2 ms per loop

In [76]: %timeit isin_nd_searchsorted(a,b).sum()
100 loops, best of 3: 1.98 ms per loop
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.