python IndexError for masked array

Question

I have data formatted as a 3-dimensional array, and I am performing a regression for each element along one of the axes. The following code works as I expect and returns a list of the regression slopes.

import numpy as np
from scipy import stats
import numpy.ma as ma

#make array
np.random.seed(0)
array = np.random.random((4,3,2))

def regress_slope(array):
  N=array.shape[0]
  alpha=0.9
  y = array[:,:,1]
  x = array[:,:,0] 
  result = [stats.mstats.theilslopes(y[i,...],x[i,...],alpha)[0] for i in range(0,N)]
  return result

result = regress_slope(array)
list(result)
print(result)

My "real" data includes invalid values and I have defined a threshold (<0.1) and tried to mask these values from the array. However, when I use the masked array it throws this error:

array2 = ma.masked_less_equal(array, 0.1)
result2 = regress_slope(array2)
list(result2)

IndexError: index 0 is out of bounds for axis 0 with size 0

I am not sure what this error message means, but I think it might be because there are not enough unmasked values to compute the regression? If this is the case, how could I adjust the code to return nan in the result?

In the first example, you call regress_slope(array) but in the second theil_slope(array2)? I'm assuming that's a typo and you mean to call regress_slope(array2)? — Grismar
– Grismar, Commented Jan 2, 2020 at 0:48

Grismar · Accepted Answer · 2020-01-02 01:31:08Z

You are correct in that the function stats.mstats.theilslopes fails with that error message if there are not enough unmasked values to compute the regression.

A minimal example:

# this works
a = ma.masked_array([1, 2], mask=[0, 0])
b = ma.masked_array([1, 2], mask=[0, 0])
stats.mstats.theilslopes(a, b, 0.95)

# but this does not
b = ma.masked_array([1, 2], mask=[0, 1])
stats.mstats.theilslopes(a, b, 0.95)

The error message indicates that, somewhere in the process of the computation, it tries to access the 1st element on the 1st axis of a result that has no elements.

I don't know enough about the theory of what you're trying to do to know if the result is useful, but this will fix your immediate problem:

import numpy as np
from scipy import stats
import numpy.ma as ma

np.random.seed(0)
a = np.random.random((4, 3, 2))


def regress_slope(arr):
    def safe_first_theilslopes(arr1, arr2, a):
        try:
            return stats.mstats.theilslopes(arr1, arr2, a)[0]
        except IndexError:
            return np.NaN

    n = arr.shape[0]
    alpha = 0.9
    y = arr[:, :, 1]
    x = arr[:, :, 0]
    return [safe_first_theilslopes(y[i, ...], x[i, ...], alpha) for i in range(0, n)]


result = regress_slope(a)
print(result)

a2 = ma.masked_less_equal(a, 0.1)
result2 = regress_slope(a2)
print(result2)

Note how I have the function return either the first element of the function result (stats.mstats.theilslopes(arr1, arr2, a)[0]) or np.NaN, so that is now baked in to that function.

This code works, but throws a few warnings you could suppress but should probably look into first:

RuntimeWarning: Mean of empty slice.
  out=out, **kwargs)
\lib\site-packages\numpy\core\_methods.py:161: RuntimeWarning: invalid value encountered in double_scalars
  ret = ret.dtype.type(ret / rcount)

Collectives™ on Stack Overflow

python IndexError for masked array

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related