1

I want to average over conditionally selected elements in a 4D numpy array, based on an index using a 3D array.

In other words, my 4D array DATA has these dimensions: [ntime,nz,ny,nx]

where as my 3D array COND which I use to conditionally sample is only a function of [ntime,ny,nx] (with the number of time slices, x and y points identical)

I want to do broadcasting, thus use something like DATA[COND[None,...]] But the problem is that the "missing" vertical dimension is not at the right, but in the middle between time and the x and y space. I could loop over the vertical levels, but I think that would be slow. Is there a way of somehow indexing DATA as

DATA[cond[times],:,COND[ys],COND[xs]]?

Setting up some dummy arrays:

np.random.seed(1234)
COND=np.random.randint(0,2,(2,3,3))  # 2 time levels, 3 X points and 3 y points
DATA=np.random.randint(0,100,(2,2,3,3)) # 2 time levels, 2 Z levels, and 3 x and y points

giving:

COND
array([[[1, 1, 0],
        [1, 0, 0],
        [0, 1, 1]],

       [[1, 1, 1],
        [0, 0, 1],
        [0, 0, 0]]])

DATA
array([[[[26, 58, 92],
         [69, 80, 73],
         [47, 50, 76]],

        [[37, 34, 38],
         [67, 11,  0],
         [75, 80,  3]]],

giving:

   [[[ 2, 19, 12],
     [65, 75, 81],
     [14, 71, 60]],

    [[46, 28, 81],
     [87, 13, 96],
     [12, 69, 95]]]])

I can find the argument using argwhere:

idx=np.argwhere(COND==1)
array([[0, 0, 0],
       [0, 0, 1],
       [0, 1, 0],
       [0, 2, 1],
       [0, 2, 2],
       [1, 0, 0],
       [1, 0, 1],
       [1, 0, 2],
       [1, 1, 2]])

Now I want to do something like

np.mean(DATA[idx[...,None,...]])

or

np.mean(DATA[idx[0],None,idx[1],idx[2])

which should give me an answer with 2 numbers corresponding to the mean DATA values at the times, x and y points when COND=1

This question is related to this: filtering a 3D numpy array according to 2D numpy array

but my klev index is in the middle and not the left or right, so I can't use the [...,None] solution

1
  • 1
    Can't you do something like: DATA[np.expand_dims(COND,axis=1)]? Or am I misunderstanding your question Commented Jan 27, 2021 at 14:44

1 Answer 1

1

Using zip to get indices along each axis

IIUC, you have done most of the work, i.e. idx

>>> [*zip(*idx)]
[(0, 0, 0, 0, 0, 1, 1, 1, 1),
 (0, 0, 1, 2, 2, 0, 0, 0, 1),
 (0, 1, 0, 1, 2, 0, 1, 2, 2)]

>>> t, y, x = zip(*idx)
>>> DATA[t, :, y, x]

array([[26, 37],
       [58, 34],
       [69, 67],
       [50, 80],
       [76,  3],
       [ 2, 46],
       [19, 28],
       [12, 81],
       [81, 96]])

>>> DATA[t, :, y, x].mean(0)
array([43.66666667, 52.44444444])

Get indices using np.where

An easier way to get the numpy.where:

>>> np.where(COND)
(array([0, 0, 0, 0, 0, 1, 1, 1, 1], dtype=int64),
 array([0, 0, 1, 2, 2, 0, 0, 0, 1], dtype=int64),
 array([0, 1, 0, 1, 2, 0, 1, 2, 2], dtype=int64))

Get indices using np.nonzero

Or, numpy.nonzero, probably the most explicit:

>>> np.nonzero(COND)
(array([0, 0, 0, 0, 0, 1, 1, 1, 1], dtype=int64),
 array([0, 0, 1, 2, 2, 0, 0, 0, 1], dtype=int64),
 array([0, 1, 0, 1, 2, 0, 1, 2, 2], dtype=int64))

Use the condition array directly

Notably, a handy trick while dealing with ndarrays is numpy.transpose, as you would have seen in the linked post, in your question, while indexing, dimensions are left aligned, but your array in the current form is not suitable for that kind of indexing, so if your aggregation dimension was at the very right, and index dimensions were to the left, that would do the trick.

So, if your data could be reordered:

Instead of:
dim = (2, 2, 3, 3)
axis-> 0, 1, 2, 3

It were:
dim = (2, 3, 3, 2)
axis-> 0, 2, 3, 1

It would have worked.

Reorder axes using np.transpose

You can use numpy.transpose for that:

>>> np.transpose(DATA, axes=(0,2,3,1))[COND==1].mean(axis=0)
array([43.66666667, 52.44444444])

Roll axes using np.roll

You could also roll your axis (==1) to the end (i.e. 4th dimension), using numpy.rollaxis:

>>> np.rollaxis(DATA, 1, 4)[COND==1].mean(0)
array([43.66666667, 52.44444444])

Move axes using np.transpose

Or, you could move your axis from source dimension to destination dimension, i.e. move axis 1 to axis 3, using np.moveaxis:

>>> np.moveaxis(DATA, source=1, destination=3)[COND==1].mean(0)
array([43.66666667, 52.44444444]) 
Sign up to request clarification or add additional context in comments.

1 Comment

my word, fantastic answer, that did the trick with zip, thank you so much!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.