3

If I have an array and I apply summation

arr = np.array([[1.,1.,2.],[2.,3.,4.],[4.,5.,6]])
np.sum(arr,axis=1)

I get the total along the three rows ([4.,9.,15.])

My complication is that arr contains data that may be bad after a certain column index. I have an integer array that tells me how many "good" values I have in each row and I want to sum/average over the good values. Say:

ngoodcols=np.array([0,1,2])
np.sum(arr[:,0:ngoodcols],axis=1)  # not legit but this is the idea

It is clear how to do this in a loop, but is there a way to sum only that many, producing [0.,2.,9.] without resorting to looping? Equivalently, I could use nansum if I knew how to set the elements in column indexes higher than b equal to np.nan, but this is a nearly equivalent problem as far as slicing is concerned.

2 Answers 2

1

One possibility is to use masked arrays:

import numpy as np

arr = np.array([[1., 1., 2.], [2., 3., 4.], [4., 5., 6]])
ngoodcols = np.array([0, 1, 2])
mask = ngoodcols[:, np.newaxis] <= np.arange(arr.shape[1])
arr_masked = np.ma.masked_array(arr, mask)
print(arr_masked)
# [[-- -- --]
#  [2.0 -- --]
#  [4.0 5.0 --]]
print(arr_masked.sum(1))
# [-- 2.0 9.0]

Note that here when there are not good values you get a "missing" value as a result, which may or may not be useful for you. Also, a masked array also allows you to easily do other operations that only apply for valid values (mean, etc.).

Another simple option is to just multiply by the mask:

import numpy as np

arr = np.array([[1., 1., 2.], [2., 3., 4.], [4., 5., 6]])
ngoodcols = np.array([0, 1, 2])
mask = ngoodcols[:, np.newaxis] <= np.arange(arr.shape[1])
print((arr * ~mask).sum(1))
# [0. 2. 9.]

Here when there are no good values you just get zero.

Sign up to request clarification or add additional context in comments.

1 Comment

Also a shout-out to Austin's answer, which uses the same arange trick with nan values.
1

Here is one way using Boolean indexing. This sets elements in column indexes higher than ones in ngoodcols equal to np.nan and use np.nansum:

import numpy as np

arr = np.array([[1.,1.,2.],[2.,3.,4.],[4.,5.,6]])
ngoodcols = np.array([0,1,2])

arr[np.asarray(ngoodcols)[:,None] <= np.arange(arr.shape[1])] = np.nan

print(np.nansum(arr, axis=1))
# [ 0.  2.  9.]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.