2

I have a numpy array such as this

[[ 0, 57],
 [ 7, 72],
 [ 2, 51],
 [ 8, 67],
 [ 4, 42]]

I want to find out for each row, how many elements in the 2nd column are within a certain distance (say, 10) of the 2nd column value for that row. So in this example, here the solution would be

[[ 0, 57, 3],
 [ 7, 72, 2],
 [ 2, 51, 3],
 [ 8, 67, 3],
 [ 4, 42, 2]]

So [first row, third column] is 3, because there are 3 elements in the 2nd column (57,51,67) which are within distance 10 from 57. Similarly for each row

Any help would be appreciated!

3
  • How large are your arrays? Commented Sep 5, 2018 at 17:20
  • Around 1 million rows Commented Sep 5, 2018 at 17:27
  • how many neighbors per point do you expect? Commented Sep 6, 2018 at 8:08

2 Answers 2

2

Here's one approach leveraging broadcasting with outer-subtraction -

(np.abs(a[:,1,None] - a[:,1]) <= 10).sum(1)

With outer subtract builtin and count_nonzero for counting -

np.count_nonzero(np.abs(np.subtract.outer(a[:,1],a[:,1]))<=10,axis=1)

Sample run -

# Input array
In [23]: a
Out[23]: 
array([[ 0, 57],
       [ 7, 72],
       [ 2, 51],
       [ 8, 67],
       [ 4, 42]])

# Get count
In [24]: count = (np.abs(a[:,1,None] - a[:,1]) <= 10).sum(1)

In [25]: count
Out[25]: array([3, 2, 3, 3, 2])

# Stack with input
In [26]: np.c_[a,count]
Out[26]: 
array([[ 0, 57,  3],
       [ 7, 72,  2],
       [ 2, 51,  3],
       [ 8, 67,  3],
       [ 4, 42,  2]])

Alternatively with SciPy's cdist -

In [53]: from scipy.spatial.distance import cdist

In [54]: (cdist(a[:,None,1],a[:,1,None], 'minkowski', p=2)<=10).sum(1)
Out[54]: array([3, 2, 3, 3, 2])

For million rows in the input, we might want to resort to a loopy one -

n = len(a)
count = np.empty(n, dtype=int)
for i in range(n):
    count[i] = np.count_nonzero(np.abs(a[:,1]-a[i,1])<=10)
Sign up to request clarification or add additional context in comments.

3 Comments

Thanks for the solution - I tried it out, but received a "MemoryError". My array has around 1 million rows
Thanks! Working now, though really slowly! Also, if I wanted to have simple difference to be less than 10 (rather than absolute), how would I do that?
@alphaomega83 What do you mean by "simple difference"?
1

Here's a non-broadcasting approach, which takes advantage of the fact that to know how many numbers are within 3 of 10, you can subtract the number of numbers <= 13 from those strictly less than 7.

import numpy as np

def broadcast(x, width):
    # for comparison
    return (np.abs(x[:,None] - x) <= width).sum(1)


def largest_leq(arr, x, allow_equal=True):
    maybe = np.searchsorted(arr, x)
    maybe = maybe.clip(0, len(arr) - 1)
    above = arr[maybe] > x if allow_equal else arr[maybe] >= x
    maybe[above] -= 1
    return maybe

def faster(x, width):
    uniq, inv, counts = np.unique(x, return_counts=True, return_inverse=True)
    counts = counts.cumsum()

    low_bounds = uniq - width
    low_ix = largest_leq(uniq, low_bounds, allow_equal=False)
    low_counts = counts[low_ix]
    low_counts[low_ix < 0] = 0

    high_bounds = uniq + width
    high_counts = counts[largest_leq(uniq, high_bounds)]

    delta = high_counts - low_counts
    out = delta[inv]
    return out

This passes my tests:

for width in range(1, 10):
    for window in range(5):
        for trial in range(10):
            x = np.random.randint(0, 10, width)
            b = broadcast(x, window).tolist()
            f = faster(x, window).tolist()
            assert b == f

and behaves pretty well even at larger sizes:

In [171]: x = np.random.random(10**6)

In [172]: %time faster(x, 0)
Wall time: 386 ms
Out[172]: array([1, 1, 1, ..., 1, 1, 1], dtype=int64)

In [173]: %time faster(x, 1)
Wall time: 372 ms
Out[173]: array([1000000, 1000000, 1000000, ..., 1000000, 1000000, 1000000], dtype=int64)

In [174]: x = np.random.randint(0, 10, 10**6)

In [175]: %timeit faster(x, 3)
10 loops, best of 3: 83 ms per loop

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.