0

I have data formatted as a 3-dimensional array, and I am performing a regression for each element along one of the axes. The following code works as I expect and returns a list of the regression slopes.

import numpy as np
from scipy import stats
import numpy.ma as ma

#make array
np.random.seed(0)
array = np.random.random((4,3,2))

def regress_slope(array):
  N=array.shape[0]
  alpha=0.9
  y = array[:,:,1]
  x = array[:,:,0] 
  result = [stats.mstats.theilslopes(y[i,...],x[i,...],alpha)[0] for i in range(0,N)]
  return result

result = regress_slope(array)
list(result)
print(result)

My "real" data includes invalid values and I have defined a threshold (<0.1) and tried to mask these values from the array. However, when I use the masked array it throws this error:

array2 = ma.masked_less_equal(array, 0.1)
result2 = regress_slope(array2)
list(result2)

IndexError: index 0 is out of bounds for axis 0 with size 0

I am not sure what this error message means, but I think it might be because there are not enough unmasked values to compute the regression? If this is the case, how could I adjust the code to return nan in the result?

2
  • 2
    In the first example, you call regress_slope(array) but in the second theil_slope(array2)? I'm assuming that's a typo and you mean to call regress_slope(array2)? Commented Jan 2, 2020 at 0:48
  • Yes sorry that was a typo, it should be regress_slope Commented Jan 2, 2020 at 1:15

1 Answer 1

1

You are correct in that the function stats.mstats.theilslopes fails with that error message if there are not enough unmasked values to compute the regression.

A minimal example:

# this works
a = ma.masked_array([1, 2], mask=[0, 0])
b = ma.masked_array([1, 2], mask=[0, 0])
stats.mstats.theilslopes(a, b, 0.95)

# but this does not
b = ma.masked_array([1, 2], mask=[0, 1])
stats.mstats.theilslopes(a, b, 0.95)

The error message indicates that, somewhere in the process of the computation, it tries to access the 1st element on the 1st axis of a result that has no elements.

I don't know enough about the theory of what you're trying to do to know if the result is useful, but this will fix your immediate problem:

import numpy as np
from scipy import stats
import numpy.ma as ma

np.random.seed(0)
a = np.random.random((4, 3, 2))


def regress_slope(arr):
    def safe_first_theilslopes(arr1, arr2, a):
        try:
            return stats.mstats.theilslopes(arr1, arr2, a)[0]
        except IndexError:
            return np.NaN

    n = arr.shape[0]
    alpha = 0.9
    y = arr[:, :, 1]
    x = arr[:, :, 0]
    return [safe_first_theilslopes(y[i, ...], x[i, ...], alpha) for i in range(0, n)]


result = regress_slope(a)
print(result)

a2 = ma.masked_less_equal(a, 0.1)
result2 = regress_slope(a2)
print(result2)

Note how I have the function return either the first element of the function result (stats.mstats.theilslopes(arr1, arr2, a)[0]) or np.NaN, so that is now baked in to that function.

This code works, but throws a few warnings you could suppress but should probably look into first:

RuntimeWarning: Mean of empty slice.
  out=out, **kwargs)
\lib\site-packages\numpy\core\_methods.py:161: RuntimeWarning: invalid value encountered in double_scalars
  ret = ret.dtype.type(ret / rcount)
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.