Here's a partial solution:
as_strided = np.lib.stride_tricks.as_strided
def fun2(refArray):
y = len(range(-halfwindow,halfwindow+1))
II=slice(halfwindow, nrow - halfwindow)
bA = boundsArray[:,II,II]
x=len(range(halfwindow,nrow-halfwindow))
shape = (x,x) + (y,y)
strides = refArray.strides
strides = strides + strides
print(shape, strides)
A = as_strided(refArray, shape=shape,strides=strides)
print(A.shape,(refArray[0,0],refArray[-1,-1]),(A[0,0,0,0],A[-1,-1,-1,-1]))
valid = (A >= bA[0,:,:,None,None]) & \
(A <= bA[1,:,:,None,None])
valid = valid.sum((-2,-1))
# # valid pts per window
print(valid.shape)
return valid
It uses np.lib.stride_tricks.as_strided to create overlapping windows. It's an idea that has been presented in several SO answers.
In timing it is 4-5x faster than your iteration.
A is a (470,470,31,31) view of refArray, i.e. 470x470 windows, each 31x31. And with the strides value they overlap.
But it is partial. valid.sum() gives the same values as your result.sum(). So it finds the same total number of 'valid' points. But valid (after summing on the 31x31 dimensions) is a (470,470) array, one value for each 'window', and that value is the number of valid points within the window. It is the equivalent of doing windows[i-halfwindow,j-halfwindow] = valid.sum() in your iteration.
I think your result is the number of windows in which each point is valid.
I haven't figured out how to map the (470,470,31,31) valid array back on to the (500,500) points. But I pretty sure there is such a mapping.
This is a temporary fix:
def fun2(refArray):
y = len(range(-halfwindow,halfwindow+1))
II=slice(halfwindow, nrow - halfwindow)
bA = boundsArray[:,II,II]
x=len(range(halfwindow,nrow-halfwindow))
shape = (x,x) + (y,y)
strides = refArray.strides
strides = strides + strides
A = as_strided(refArray, shape=shape,strides=strides)
valid = (A >= bA[0,:,:,None,None]) & \
(A <= bA[1,:,:,None,None])
return valid
def fun3(refArray,valid):
result = np.zeros_like(refArray)
n0,n1,n2,n3 = valid.shape
for i in range(n0):
for j in range(n1):
result[ i:(i+n2),j:(j+n3) ] += valid[i,j]
return result
result = fun3(refArray, fun2(refArray))
fun3 has the same sort of iteration as your code, but there's still a 2x speedup.
While each window has the same number of points, the reverse is not true. Corner points occur in only one window each. Middle ones around 900 each. That may make a pure vectorized numpy solution impossible.
boundsArrayandrefArrayalways random, or would these be specified in production code? \$\endgroup\$