Find multiple values in a Numpy array

Question

a and b are two Numpy arrays of integers. They are sorted and without repetitions. b is a subset of a. I need to find the index in a of every element of b. Is there an efficient Numpy function that could help, so I can avoid the python loop?

(Actually, the arrays are of pandas.DatetimeIndex and Numpy datetime64, but I guess it doesn't change the answer.)

NPE · Accepted Answer · 2013-03-04 15:51:43Z

12

numpy.searchsorted() can be used to do this:

In [15]: a = np.array([1, 2, 3, 5, 10, 20, 25])

In [16]: b = np.array([1, 5, 20, 25])

In [17]: a.searchsorted(b)
Out[17]: array([0, 3, 5, 6])

From what I understand, it doesn't require b to be sorted, and uses binary search on a. This means that it's O(n logn) rather than O(n).

If that's not good enough, there's always Cython. :-)

edited Mar 4, 2013 at 15:51

answered Mar 4, 2013 at 15:36

NPE

503k114 gold badges970 silver badges1k bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

mgilson Over a year ago

I wouldn't even think that they'd need a binary search here ... Under the assumption that both are sorted you can easily convince yourself that this can be done in O(N) time. (Consider the merge stage of a merge-sort). I'd be interesting to see if a python implementation could beat this under those assumptions.

NPE Over a year ago

@mgilson: You are quite right that the OP's problem can be solved in O(n). What I am saying is that searchsorted() solves a more general problem, and therefore can't be O(n).

mgilson Over a year ago

Yeah, I was just realizing that. Too bad they don't have a searchdoublesorted function :)

Collectives™ on Stack Overflow

Find multiple values in a Numpy array

1 Answer 1

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related