Finding closest values in two numpy arrays

Question

The aim here is speed- I am trying to get away from looping through the arrays in question. It can however be assumed that the two arrays are sorted.

a = np.arange(10)
b = np.array([2.3, 3.5, 5.8, 13])
c = somefunc(a,b)

Now somefunc should find the indices of a for which the values in b are closest too, i.e.

In []: c
Out[]: array([2, 3or4, 6, 9])  #3 or 4 depending on python2 or 3

Once again, this could be done with a loop, but I am looking for something a lot faster. I got quite close by taking the absolute difference type approach, something like:

np.argmin(np.abs(a[:, np.newaxis] - b), axis=0)

But even this is a little slow as a lot of unnecessary subtractions are done.

I think you probably want c to be array([2, 3, 6, 9]), because you're comparing with arange(10), which starts from 0. — Praveen
– Praveen, Commented Jun 13, 2017 at 16:33
What do you mean when you say your argmin result doesn't give a single index per b value? It does for me... — Praveen
– Praveen, Commented Jun 13, 2017 at 16:34
Sorry, yes, it should be [2, 3 or 4, 6, 9] depending on it is is python 2 or 3. — RexFuzzle
– RexFuzzle, Commented Jun 14, 2017 at 7:16

RexFuzzle · Accepted Answer · 2017-06-15 12:16:18Z

2

So using the suggestion from @Eelco to use searchsorted, I came to the following which is quicker with a larger dataset than the np.argmin on the vector method:

def finder(a, b):
    dup = np.searchsorted(a, b)
    uni = np.unique(dup)
    uni = uni[uni < a.shape[0]]
    ret_b = np.zeros(uni.shape[0])
    for idx, val in enumerate(uni):
        bw = np.argmin(np.abs(a[val]-b[dup == val]))
        tt = dup == val
        ret_b[idx] = np.where(tt == True)[0][bw]
    return np.column_stack((uni, ret_b))

edited Jun 15, 2017 at 12:16

answered Jun 14, 2017 at 9:03

RexFuzzle

1,4522 gold badges18 silver badges30 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

codelessbugging · Accepted Answer · 2017-06-13 15:55:09Z

0

keep track of two pointers, one for the current index for a, and one for b. as we increment pointer a, we keep track of the min difference and index between the elements being pointed to, until pointed_a > pointed_b. update the min difference and index again (if there are changes). there we have our answer for the first element. continue the search similarly by increasing the pointer for b, and pick up pointer a from where we left off.

complexity: O( len a + len b ), therefore linear

answered Jun 13, 2017 at 15:55

codelessbugging

2,9491 gold badge17 silver badges19 bronze badges

1 Comment

enedil Over a year ago

If log a is ω(log(b) * log(log(b))) then this can be optimized to len a binary searches, which improves efficiency.

Eelco Hoogendoorn · Accepted Answer · 2017-06-14 11:47:43Z

0

scipy.spatial.cKDTree is the simplest solution to this problem; vectorized and more than likely good enough for your application; but not theoretically optimal given that your data is sorted.

Alternatively, you can use numpy.searchsorted. Use this to find the left-or right sided insertion point, and then compare that point and the next one to find the closest point.

answered Jun 14, 2017 at 11:47

Eelco Hoogendoorn

10.8k1 gold badge46 silver badges43 bronze badges

Collectives™ on Stack Overflow

Finding closest values in two numpy arrays

3 Answers 3

Comments

1 Comment

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related