I want to average over conditionally selected elements in a 4D numpy array, based on an index using a 3D array.
In other words, my 4D array DATA has these dimensions: [ntime,nz,ny,nx]
where as my 3D array COND which I use to conditionally sample is only a function of [ntime,ny,nx] (with the number of time slices, x and y points identical)
I want to do broadcasting, thus use something like DATA[COND[None,...]] But the problem is that the "missing" vertical dimension is not at the right, but in the middle between time and the x and y space. I could loop over the vertical levels, but I think that would be slow. Is there a way of somehow indexing DATA as
DATA[cond[times],:,COND[ys],COND[xs]]?
Setting up some dummy arrays:
np.random.seed(1234)
COND=np.random.randint(0,2,(2,3,3)) # 2 time levels, 3 X points and 3 y points
DATA=np.random.randint(0,100,(2,2,3,3)) # 2 time levels, 2 Z levels, and 3 x and y points
giving:
COND
array([[[1, 1, 0],
[1, 0, 0],
[0, 1, 1]],
[[1, 1, 1],
[0, 0, 1],
[0, 0, 0]]])
DATA
array([[[[26, 58, 92],
[69, 80, 73],
[47, 50, 76]],
[[37, 34, 38],
[67, 11, 0],
[75, 80, 3]]],
giving:
[[[ 2, 19, 12],
[65, 75, 81],
[14, 71, 60]],
[[46, 28, 81],
[87, 13, 96],
[12, 69, 95]]]])
I can find the argument using argwhere:
idx=np.argwhere(COND==1)
array([[0, 0, 0],
[0, 0, 1],
[0, 1, 0],
[0, 2, 1],
[0, 2, 2],
[1, 0, 0],
[1, 0, 1],
[1, 0, 2],
[1, 1, 2]])
Now I want to do something like
np.mean(DATA[idx[...,None,...]])
or
np.mean(DATA[idx[0],None,idx[1],idx[2])
which should give me an answer with 2 numbers corresponding to the mean DATA values at the times, x and y points when COND=1
This question is related to this: filtering a 3D numpy array according to 2D numpy array
but my klev index is in the middle and not the left or right, so I can't use the [...,None] solution
DATA[np.expand_dims(COND,axis=1)]? Or am I misunderstanding your question