How to average over conditionally selected numpy array entries in a 4D array based on an index from a 3D array

Question

I want to average over conditionally selected elements in a 4D numpy array, based on an index using a 3D array.

In other words, my 4D array DATA has these dimensions: [ntime,nz,ny,nx]

where as my 3D array COND which I use to conditionally sample is only a function of [ntime,ny,nx] (with the number of time slices, x and y points identical)

I want to do broadcasting, thus use something like DATA[COND[None,...]] But the problem is that the "missing" vertical dimension is not at the right, but in the middle between time and the x and y space. I could loop over the vertical levels, but I think that would be slow. Is there a way of somehow indexing DATA as

DATA[cond[times],:,COND[ys],COND[xs]]?

Setting up some dummy arrays:

np.random.seed(1234)
COND=np.random.randint(0,2,(2,3,3))  # 2 time levels, 3 X points and 3 y points
DATA=np.random.randint(0,100,(2,2,3,3)) # 2 time levels, 2 Z levels, and 3 x and y points

giving:

COND
array([[[1, 1, 0],
        [1, 0, 0],
        [0, 1, 1]],

       [[1, 1, 1],
        [0, 0, 1],
        [0, 0, 0]]])

DATA
array([[[[26, 58, 92],
         [69, 80, 73],
         [47, 50, 76]],

        [[37, 34, 38],
         [67, 11,  0],
         [75, 80,  3]]],

giving:

   [[[ 2, 19, 12],
     [65, 75, 81],
     [14, 71, 60]],

    [[46, 28, 81],
     [87, 13, 96],
     [12, 69, 95]]]])

I can find the argument using argwhere:

idx=np.argwhere(COND==1)
array([[0, 0, 0],
       [0, 0, 1],
       [0, 1, 0],
       [0, 2, 1],
       [0, 2, 2],
       [1, 0, 0],
       [1, 0, 1],
       [1, 0, 2],
       [1, 1, 2]])

Now I want to do something like

np.mean(DATA[idx[...,None,...]])

or

np.mean(DATA[idx[0],None,idx[1],idx[2])

which should give me an answer with 2 numbers corresponding to the mean DATA values at the times, x and y points when COND=1

This question is related to this: filtering a 3D numpy array according to 2D numpy array

but my klev index is in the middle and not the left or right, so I can't use the [...,None] solution

Can't you do something like: DATA[np.expand_dims(COND,axis=1)]? Or am I misunderstanding your question — joostblack
– joostblack, Commented Jan 27, 2021 at 14:44

Sayandip Dutta · Accepted Answer · 2021-01-27 15:24:54Z

Using `zip` to get indices along each axis

IIUC, you have done most of the work, i.e. idx

>>> [*zip(*idx)]
[(0, 0, 0, 0, 0, 1, 1, 1, 1),
 (0, 0, 1, 2, 2, 0, 0, 0, 1),
 (0, 1, 0, 1, 2, 0, 1, 2, 2)]

>>> t, y, x = zip(*idx)
>>> DATA[t, :, y, x]

array([[26, 37],
       [58, 34],
       [69, 67],
       [50, 80],
       [76,  3],
       [ 2, 46],
       [19, 28],
       [12, 81],
       [81, 96]])

>>> DATA[t, :, y, x].mean(0)
array([43.66666667, 52.44444444])

Get indices using `np.where`

An easier way to get the numpy.where:

>>> np.where(COND)
(array([0, 0, 0, 0, 0, 1, 1, 1, 1], dtype=int64),
 array([0, 0, 1, 2, 2, 0, 0, 0, 1], dtype=int64),
 array([0, 1, 0, 1, 2, 0, 1, 2, 2], dtype=int64))

Get indices using np.nonzero

Or, numpy.nonzero, probably the most explicit:

>>> np.nonzero(COND)
(array([0, 0, 0, 0, 0, 1, 1, 1, 1], dtype=int64),
 array([0, 0, 1, 2, 2, 0, 0, 0, 1], dtype=int64),
 array([0, 1, 0, 1, 2, 0, 1, 2, 2], dtype=int64))

Use the condition array directly

Notably, a handy trick while dealing with ndarrays is numpy.transpose, as you would have seen in the linked post, in your question, while indexing, dimensions are left aligned, but your array in the current form is not suitable for that kind of indexing, so if your aggregation dimension was at the very right, and index dimensions were to the left, that would do the trick.

So, if your data could be reordered:

Instead of:
dim = (2, 2, 3, 3)
axis-> 0, 1, 2, 3

It were:
dim = (2, 3, 3, 2)
axis-> 0, 2, 3, 1

It would have worked.

Reorder axes using `np.transpose`

You can use numpy.transpose for that:

>>> np.transpose(DATA, axes=(0,2,3,1))[COND==1].mean(axis=0)
array([43.66666667, 52.44444444])

Roll axes using `np.roll`

You could also roll your axis (==1) to the end (i.e. 4th dimension), using numpy.rollaxis:

>>> np.rollaxis(DATA, 1, 4)[COND==1].mean(0)
array([43.66666667, 52.44444444])

Move axes using `np.transpose`

Or, you could move your axis from source dimension to destination dimension, i.e. move axis 1 to axis 3, using np.moveaxis:

>>> np.moveaxis(DATA, source=1, destination=3)[COND==1].mean(0)
array([43.66666667, 52.44444444])

my word, fantastic answer, that did the trick with zip, thank you so much!

Collectives™ on Stack Overflow

How to average over conditionally selected numpy array entries in a 4D array based on an index from a 3D array

1 Answer 1

Using `zip` to get indices along each axis

Get indices using `np.where`

Get indices using np.nonzero

Use the condition array directly

Reorder axes using `np.transpose`

Roll axes using `np.roll`

Move axes using `np.transpose`

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Using zip to get indices along each axis

Get indices using np.where

Get indices using np.nonzero

Use the condition array directly

Reorder axes using np.transpose

Roll axes using np.roll

Move axes using np.transpose

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related

Using `zip` to get indices along each axis

Get indices using `np.where`

Reorder axes using `np.transpose`

Roll axes using `np.roll`

Move axes using `np.transpose`