I would recommend a completely different approach if you are dealing with large arrays. Right now, you are searching through the entirety of arr for every element in arr2. This is clearly overkill. Instead, you can operate on a sorted arr and simply sum between the insertion points obtained from np.searchsorted.
Sort arr in place if you can:
arr.sort()
You know the width of your interval, so find the bounding values. I'm making the array shaped (20, 2) to match the bounds more easily:
bounds = arr2.reshape(-1, 1) + [-0.2, 0.2]
Now find the insertion indices:
ind = np.searchsorted(arr, bounds)
ind is the same shape as bounds. ind[i, :] are the start (inclusive) and end (exclusive) indices into arr that correspond to the ith element of arr2. In other words, for any given i, arr3[i] in the original question is arr[ind[i, 0]:ind[i, 1]].mean(). You can use this directly for a non-vectorized solution:
result = np.array([arr[slice(*i)].mean() for i in ind])
There are a couple of ways to vectorize the solution. In either case, you'll want the number of elements in each run:
n = np.diff(ind, axis=1).ravel()
A quick and dirty solution that is prone to rounding errors uses np.cumsum and fancy indexing using ind:
cumulative = np.r_[0, np.cumsum(arr)]
sums = np.diff(cumulative[ind], axis=1).ravel()
result = sums / n
A more robust solution would extract only the sums that you actually need using np.add.reduceat:
arr = np.r_[arr, 0] # compensate for index past the end
sums = np.add.reduceat(arr, ind.ravel())[::2]
result = sums / n
You can compare the results of both approaches to arr3 computed in the question to verify that the second approach is noticeably more accurate, even with your toy example.
Timing
def original(arr, arr2, d):
arr3 = np.empty_like(arr2)
for index, num in enumerate(arr2):
arr3[index] = np.mean(arr[np.abs(num - arr) < d])
return arr3
def ananda(arr, arr2, d):
arr_tile = np.tile(arr, (len(arr2), 1))
arr_tile[np.abs(arr - arr2[:, None]) >= d] = np.nan
return np.nanmean(arr_tile, axis=1)
def mad_0(arr, arr2, d):
arr.sort()
ind = np.searchsorted(arr, arr2.reshape(-1, 1) + [-d, d])
return np.array([arr[slice(*i)].mean() for i in ind])
def mad_1(arr, arr2, d):
arr.sort()
ind = np.searchsorted(arr, arr2.reshape(-1, 1) + [-d, d])
n = np.diff(ind, axis=1).ravel()
sums = np.diff(np.r_[0, np.cumsum(arr)][ind], axis=1).ravel()
return sums / n
def mad_2(arr, arr2, d):
arr.sort()
ind = np.searchsorted(arr, arr2.reshape(-1, 1) + [-d, d])
n = np.diff(ind, axis=1).ravel()
arr = np.r_[arr, 0]
sums = np.add.reduceat(arr, ind.ravel())[::2]
return sums / n
Inputs (reset for each run):
np.random.seed(42)
arr = np.random.rand(100)
arr2 = np.linspace(0, 1, 1000)
Results:
original: 25.5 ms ± 278 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
ananda: 2.66 ms ± 35.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
mad_0: 14.5 ms ± 48.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
mad_1: 211 µs ± 1.41 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
mad_2: 242 µs ± 1.93 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
For 100 elements with 1k bins, the original approach is ~10x slower than using np.tile. Using a list comprehension is only 2x better than the original approach. While the np.cumsum approach appears to be a little faster than np.add.reduce, it is potentially not as numerically stable.
Another advantage of using the methods I suggest is that you can change arr2 arbitrarily, while arr only needs to be sorted once.
arr2[:,None]-arr. That should be a (20,100) array.np.meantakes anaxisparameter letting you reduce that to a (20,) result.np.empty_likerather thannp.zerosforarr3arr2into a list to enumerate it. It will iterate just fine as an array