How to calculate moving average of NumPy array with varying window sizes defined by an array of indices?

Question

Which is the most pythonic way to average the values in a 2d array (axis=1) based on a range in a 1d array?

I am trying to average arrays of environmental variables (my 2d array) based on every 2 degrees of latitude (my id array). I have a latitude array that goes from -33.9 to 29.5. I'd like to average the environmental variables within every 2 degrees from -34 to 30.

The number of elements within each 2 degrees may be different, for example:

arr = array([[5,3,4,5,6,4,2,4,5,8],
             [4,5,8,5,2,3,6,4,1,7],
             [8,3,5,8,5,2,5,9,9,4]])

idx = array([1,1,1,2,2,3,3,3,3,4])

I would then average the values in arr based on idx[0:3], idx[3:9], idx[9].

I would like to get a result of:

arrAvg = array([4,4.2,8],
               [6.3,3.5,7],
               [5.3,6.3,4])

Your actual problem may be solvable in different ways, that don't involve this procedure. Are you saying you want to average over longitude for each latitude? Or average over small strips of 4 degrees (-2 to +2) that span the globe? Or average only across latitude, leaving the longitude information? — 9769953
– 9769953, Commented Oct 11, 2018 at 13:53
I only want to average over the range of latitude. the longitude information is not important — GeoMonkey
– GeoMonkey, Commented Oct 11, 2018 at 13:59
It looks like the first average value in each inner dimension is from the first three numbers, while the next values are from two numbers each, and the last one is just the last number. Is that correct? Because in that case, a moving average will not work (though I think it's actually better than what your example shows). — 9769953
– 9769953, Commented Oct 11, 2018 at 14:02
you've interpreted it correctly, but I've made an error there. I'll edit it — GeoMonkey
– GeoMonkey, Commented Oct 11, 2018 at 14:06
@Georgy I don't think these are quite the same. The suggested answer does average over an interval but averages the values within one array, without referencing an index array. This will not work here as the number of values in arr per index (every 2 degrees may be different) — GeoMonkey
– GeoMonkey, Commented Oct 11, 2018 at 14:13

Georgy · Accepted Answer · 2018-10-11 21:56:12Z

@Andyk already explained in his post how to calculate the average having a list of indices.
I will provide a solution for getting those indices.

Here is a general approach:

from typing import Optional

import numpy as np


def get_split_indices(array: np.ndarray,
                      *,
                      window_size: int,
                      start_value: Optional[int] = None) -> np.ndarray:
    """
    :param array: input array with consequent integer indices
    :param window_size: specifies range of indices
    which will be included in a separate window
    :param start_value: from which the window will start
    :return: array of indices marking the borders of the windows
    """
    if start_value is None:
        start_value = array[0]

    diff = np.diff(array)
    diff_indices = np.where(diff)[0] + 1

    slice_ = slice(window_size - 1 - (array[0] - start_value) % window_size,
                   None,
                   window_size)

    return diff_indices[slice_]

Examples of usage:

Checking it with your example data:

# indices:             3            9
idx = np.array([1,1,1, 2,2,3,3,3,3, 4])

you can get the indices separating different windows like this:

get_split_indices(idx,
                  window_size=2,
                  start_value=0)
>>> array([3, 9])

With this function you can also specify different window sizes:

# indices:                     7        11               17
idx = np.array([0,1,1,2,2,3,3, 4,5,6,7, 8,9,10,11,11,11, 12,13])

get_split_indices(idx,
                  window_size=4,
                  start_value=0)
>>> array([ 7, 11, 17])

and different starting values:

# indices:         1            7      10     13              18
idx = np.array([0, 1,1,2,2,3,3, 4,5,6, 7,8,9, 10,11,11,11,12, 13])
get_split_indices(idx,
                  window_size=3,
                  start_value=-2)
>>> array([ 1,  7, 10, 13, 18])

Note that I made the first element of array a starting value by default.

Feel free to edit the docstring. I am not a native English speaker.

Georgy · Accepted Answer · 2018-10-11 19:29:33Z

1

You could use the np.hsplit function. For your example of indices 0:3, 3:9, 9 it goes like this:

np.hsplit(arr, [3, 9])

which gives you a list of arrays:

[array([[5, 3, 4],
        [4, 5, 8],
        [8, 3, 5]]), 
 array([[5, 6, 4, 2, 4, 5],
        [5, 2, 3, 6, 4, 1],
        [8, 5, 2, 5, 9, 9]]), 
 array([[8],
        [7],
        [4]])]

Then you can compute the mean as follows:

m = [np.mean(a, axis=1) for a in np.hsplit(arr, [3, 9])]

And convert it back to an array:

np.vstack(m).T

edited Oct 11, 2018 at 19:29

Georgy

14k7 gold badges69 silver badges80 bronze badges

answered Oct 11, 2018 at 14:58

kuzand

9,8864 gold badges48 silver badges50 bronze badges

2 Comments

Guimoute Over a year ago

Half of the problem is determining that slicing 0:3, 3:9, 9 based on idx.

Georgy Over a year ago

In order not to keep that list m in memory you could make it a generator: m = (np.mean(a, axis=1) for a in np.hsplit(arr, [3, 9])) or mean = partial(np.mean, axis=1); m = map(mean, np.hsplit(arr, [3, 9])).

Collectives™ on Stack Overflow

How to calculate moving average of NumPy array with varying window sizes defined by an array of indices?

2 Answers 2

1 Comment

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related