6

My problem

Suppose I have

a = np.array([ np.array([1,2]), np.array([3,4]), np.array([5,6]), np.array([7,8]), np.array([9,10])])
b = np.array([ np.array([5,6]), np.array([1,2]), np.array([3,192])])

They are two arrays, of different sizes, containing other arrays (the inner arrays have same sizes!)

I want to count how many items of b (i.e. inner arrays) are also in a. Notice that I am not considering their position!

How can I do that?

My Try

count = 0
for bitem in b:
     for aitem in a:
         if aitem==bitem:
               count+=1

Is there a better way? Especially in one line, maybe with some comprehension..

2
  • 4
    Got to upvote for the title alone Commented Aug 29, 2017 at 10:02
  • @thanks man, appreciate that Commented Aug 29, 2017 at 10:07

4 Answers 4

3

The numpy_indexed package contains efficient (nlogn, generally) and vectorized solutions to these types of problems:

import numpy_indexed as npi
count = len(npi.intersection(a, b))

Note that this is subtly different than your double loop, discarding duplicate entries in a and b for instance. If you want to retain duplicates in b, this would work:

count = npi.in_(b, a).sum()

Duplicate entries in a could also be handled by doing npi.count(a) and factoring in the result of that; but anyway, im just rambling on for illustration purposes since I imagine the distinction probably does not matter to you.

Sign up to request clarification or add additional context in comments.

2 Comments

Do you think these two solutions are faster than the ones described above?
Those methods are quadratic in both memory and computation; so for anything more than a few array elements, definitely
2

Here is a simple way to do it:

a = np.array([ np.array([1,2]), np.array([3,4]), np.array([5,6]), np.array([7,8]), np.array([9,10])])
b = np.array([ np.array([5,6]), np.array([1,2]), np.array([3,192])])

count = np.count_nonzero(
    np.any(np.all(a[:, np.newaxis, :] == b[np.newaxis, :, :], axis=-1), axis=0))

print(count)
>>> 2

3 Comments

First, shouldn't count be 2? There's only 2 matching arrays in the input. You're counting matching elements (5, 6, 1, 2, and 3) instead of matching arrays ((5,6) and (1,2))
Also, np.logical_or.reduce(.., axis = 0) is equivalent to np.any( . . ., axis = 0), and using np.count_nonzero on a boolean array is wasteful compared to a simple sum()
@DanielF You're right, I misread the question, I thought it was about common elements in general, not subarrays. count_nonzero is significantly faster that sum here, though (check %timeit (np.random.rand(10000000) > .5).sum() and %timeit np.count_nonzero(np.random.rand(10000000) > .5) in IPython).
2

You can do what you want in one liner as follows:

count = sum([np.array_equal(x,y) for x,y in product(a,b)])

Explanation

Here's an explanation of what's happening:

  1. Iterate through the two arrays using itertools.product which will create an iterator over the cartesian product of the two arrays.
  2. Compare each two arrays in a tuple (x,y) coming from step 1. using np.array_equal
  3. True is equal to 1 when using sum on a list

Full example:

The final code looks like this:

import numpy as np 
from itertools import product 
a = np.array([ np.array([1,2]), np.array([3,4]), np.array([5,6]), np.array([7,8]), np.array([9,10])])
b = np.array([ np.array([5,6]), np.array([1,2]), np.array([3,192])])
count = sum([np.array_equal(x,y) for x,y in product(a,b)])
# output: 2

Comments

1

You can convert the rows to dtype = np.void and then use np.in1d as on the resulting 1d arrays

def void_arr(a):
    return np.ascontiguousarray(a).view(np.dtype((np.void, a.dtype.itemsize * a.shape[1]))) 

b[np.in1d(void_arr(b), void_arr(a))]

array([[5, 6],
       [1, 2]])

If you just want the number of intersections, it's

np.in1d(void_arr(b), void_arr(a)).sum()

2

Note: if there are repeat items in b or a, then np.in1d(void_arr(b), void_arr(a)).sum() likely won't be equal to np.in1d(void_arr(a), void_arr(b)).sum(). I've reversed the order from my original answer to match your question (i.e. how many elements of b are in a?)

For more information, see the third answer here

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.