numpy slices using column dependent end index from an integer array

Question

If I have an array and I apply summation

arr = np.array([[1.,1.,2.],[2.,3.,4.],[4.,5.,6]])
np.sum(arr,axis=1)

I get the total along the three rows ([4.,9.,15.])

My complication is that arr contains data that may be bad after a certain column index. I have an integer array that tells me how many "good" values I have in each row and I want to sum/average over the good values. Say:

ngoodcols=np.array([0,1,2])
np.sum(arr[:,0:ngoodcols],axis=1)  # not legit but this is the idea

It is clear how to do this in a loop, but is there a way to sum only that many, producing [0.,2.,9.] without resorting to looping? Equivalently, I could use nansum if I knew how to set the elements in column indexes higher than b equal to np.nan, but this is a nearly equivalent problem as far as slicing is concerned.

javidcf · Accepted Answer · 2019-03-06 17:20:17Z

1

One possibility is to use masked arrays:

import numpy as np

arr = np.array([[1., 1., 2.], [2., 3., 4.], [4., 5., 6]])
ngoodcols = np.array([0, 1, 2])
mask = ngoodcols[:, np.newaxis] <= np.arange(arr.shape[1])
arr_masked = np.ma.masked_array(arr, mask)
print(arr_masked)
# [[-- -- --]
#  [2.0 -- --]
#  [4.0 5.0 --]]
print(arr_masked.sum(1))
# [-- 2.0 9.0]

Note that here when there are not good values you get a "missing" value as a result, which may or may not be useful for you. Also, a masked array also allows you to easily do other operations that only apply for valid values (mean, etc.).

Another simple option is to just multiply by the mask:

import numpy as np

arr = np.array([[1., 1., 2.], [2., 3., 4.], [4., 5., 6]])
ngoodcols = np.array([0, 1, 2])
mask = ngoodcols[:, np.newaxis] <= np.arange(arr.shape[1])
print((arr * ~mask).sum(1))
# [0. 2. 9.]

Here when there are no good values you just get zero.

answered Mar 6, 2019 at 17:20

javidcf

59.9k7 gold badges87 silver badges134 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Eli S Over a year ago

Also a shout-out to Austin's answer, which uses the same arange trick with nan values.

Austin · Accepted Answer · 2019-03-06 17:16:08Z

1

Here is one way using Boolean indexing. This sets elements in column indexes higher than ones in ngoodcols equal to np.nan and use np.nansum:

import numpy as np

arr = np.array([[1.,1.,2.],[2.,3.,4.],[4.,5.,6]])
ngoodcols = np.array([0,1,2])

arr[np.asarray(ngoodcols)[:,None] <= np.arange(arr.shape[1])] = np.nan

print(np.nansum(arr, axis=1))
# [ 0.  2.  9.]

answered Mar 6, 2019 at 17:16

Austin

26.1k4 gold badges28 silver badges52 bronze badges

Collectives™ on Stack Overflow

numpy slices using column dependent end index from an integer array

2 Answers 2

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related