2

Which is the most pythonic way to average the values in a 2d array (axis=1) based on a range in a 1d array?

I am trying to average arrays of environmental variables (my 2d array) based on every 2 degrees of latitude (my id array). I have a latitude array that goes from -33.9 to 29.5. I'd like to average the environmental variables within every 2 degrees from -34 to 30.

The number of elements within each 2 degrees may be different, for example:

arr = array([[5,3,4,5,6,4,2,4,5,8],
             [4,5,8,5,2,3,6,4,1,7],
             [8,3,5,8,5,2,5,9,9,4]])

idx = array([1,1,1,2,2,3,3,3,3,4])

I would then average the values in arr based on idx[0:3], idx[3:9], idx[9].

I would like to get a result of:

arrAvg = array([4,4.2,8],
               [6.3,3.5,7],
               [5.3,6.3,4]) 
13
  • Your actual problem may be solvable in different ways, that don't involve this procedure. Are you saying you want to average over longitude for each latitude? Or average over small strips of 4 degrees (-2 to +2) that span the globe? Or average only across latitude, leaving the longitude information? Commented Oct 11, 2018 at 13:53
  • I only want to average over the range of latitude. the longitude information is not important Commented Oct 11, 2018 at 13:59
  • It looks like the first average value in each inner dimension is from the first three numbers, while the next values are from two numbers each, and the last one is just the last number. Is that correct? Because in that case, a moving average will not work (though I think it's actually better than what your example shows). Commented Oct 11, 2018 at 14:02
  • you've interpreted it correctly, but I've made an error there. I'll edit it Commented Oct 11, 2018 at 14:06
  • @Georgy I don't think these are quite the same. The suggested answer does average over an interval but averages the values within one array, without referencing an index array. This will not work here as the number of values in arr per index (every 2 degrees may be different) Commented Oct 11, 2018 at 14:13

2 Answers 2

2

@Andyk already explained in his post how to calculate the average having a list of indices.
I will provide a solution for getting those indices.

Here is a general approach:

from typing import Optional

import numpy as np


def get_split_indices(array: np.ndarray,
                      *,
                      window_size: int,
                      start_value: Optional[int] = None) -> np.ndarray:
    """
    :param array: input array with consequent integer indices
    :param window_size: specifies range of indices
    which will be included in a separate window
    :param start_value: from which the window will start
    :return: array of indices marking the borders of the windows
    """
    if start_value is None:
        start_value = array[0]

    diff = np.diff(array)
    diff_indices = np.where(diff)[0] + 1

    slice_ = slice(window_size - 1 - (array[0] - start_value) % window_size,
                   None,
                   window_size)

    return diff_indices[slice_]

Examples of usage:

Checking it with your example data:

# indices:             3            9
idx = np.array([1,1,1, 2,2,3,3,3,3, 4])

you can get the indices separating different windows like this:

get_split_indices(idx,
                  window_size=2,
                  start_value=0)
>>> array([3, 9])

With this function you can also specify different window sizes:

# indices:                     7        11               17
idx = np.array([0,1,1,2,2,3,3, 4,5,6,7, 8,9,10,11,11,11, 12,13])

get_split_indices(idx,
                  window_size=4,
                  start_value=0)
>>> array([ 7, 11, 17])

and different starting values:

# indices:         1            7      10     13              18
idx = np.array([0, 1,1,2,2,3,3, 4,5,6, 7,8,9, 10,11,11,11,12, 13])
get_split_indices(idx,
                  window_size=3,
                  start_value=-2)
>>> array([ 1,  7, 10, 13, 18])

Note that I made the first element of array a starting value by default.

Sign up to request clarification or add additional context in comments.

1 Comment

Feel free to edit the docstring. I am not a native English speaker.
1

You could use the np.hsplit function. For your example of indices 0:3, 3:9, 9 it goes like this:

np.hsplit(arr, [3, 9])

which gives you a list of arrays:

[array([[5, 3, 4],
        [4, 5, 8],
        [8, 3, 5]]), 
 array([[5, 6, 4, 2, 4, 5],
        [5, 2, 3, 6, 4, 1],
        [8, 5, 2, 5, 9, 9]]), 
 array([[8],
        [7],
        [4]])]

Then you can compute the mean as follows:

m = [np.mean(a, axis=1) for a in np.hsplit(arr, [3, 9])]

And convert it back to an array:

np.vstack(m).T

2 Comments

Half of the problem is determining that slicing 0:3, 3:9, 9 based on idx.
In order not to keep that list m in memory you could make it a generator: m = (np.mean(a, axis=1) for a in np.hsplit(arr, [3, 9])) or mean = partial(np.mean, axis=1); m = map(mean, np.hsplit(arr, [3, 9])).

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.