4

a and b are two Numpy arrays of integers. They are sorted and without repetitions. b is a subset of a. I need to find the index in a of every element of b. Is there an efficient Numpy function that could help, so I can avoid the python loop?

(Actually, the arrays are of pandas.DatetimeIndex and Numpy datetime64, but I guess it doesn't change the answer.)

1 Answer 1

12

numpy.searchsorted() can be used to do this:

In [15]: a = np.array([1, 2, 3, 5, 10, 20, 25])

In [16]: b = np.array([1, 5, 20, 25])

In [17]: a.searchsorted(b)
Out[17]: array([0, 3, 5, 6])

From what I understand, it doesn't require b to be sorted, and uses binary search on a. This means that it's O(n logn) rather than O(n).

If that's not good enough, there's always Cython. :-)

Sign up to request clarification or add additional context in comments.

3 Comments

I wouldn't even think that they'd need a binary search here ... Under the assumption that both are sorted you can easily convince yourself that this can be done in O(N) time. (Consider the merge stage of a merge-sort). I'd be interesting to see if a python implementation could beat this under those assumptions.
@mgilson: You are quite right that the OP's problem can be solved in O(n). What I am saying is that searchsorted() solves a more general problem, and therefore can't be O(n).
Yeah, I was just realizing that. Too bad they don't have a searchdoublesorted function :)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.