Set values in numpy array to NaN by index

Question

I want to set specific values in a numpy array to NaN (to exclude them from a row-wise mean calculation).

I tried

import numpy

x = numpy.array([[0, 1, 2, 3, 4, 5, 6, 7, 8, 9], [0, 0, 0, 0, 0, 0, 0, 0, 1, 0]])
cutoff = [5, 7]
for i in range(len(x)):
    x[i][0:cutoff[i]:1] = numpy.nan

Looking at x, I only see -9223372036854775808 where I expect NaN.

I thought about an alternative:

for i in range(len(x)):
    for k in range(cutoff[i]):
        x[i][k] = numpy.nan

Nothing happens. What am I doing wrong?

can you have nan's in an integer array? Does dtype=float x[0][0:5] = np.nan;x[1][0:7] = np.nan work? — Padraic Cunningham
– Padraic Cunningham, Commented May 3, 2015 at 21:34
If you use @Divakar's solution you can just avoid the nan issue mask = np.asarray(cutoff)[:,None] > np.arange(x.shape[1]) answer = np.ma.masked_where(mask, x).mean(axis=1) — paddyg
– paddyg, Commented May 3, 2015 at 22:58

unutbu · Accepted Answer · 2015-05-03 21:36:16Z

11

nan is a floating-point value. When x is an array with integer dtype, it can not be assigned a nan value. When nan is assigned to an array of integer dtype, the value is automatically converted to an int:

In [85]: np.array(np.nan).astype(int).item()
Out[85]: -9223372036854775808

So to fix your code, make x an array of float dtype:

x = numpy.array([[0, 1, 2, 3, 4, 5, 6, 7, 8, 9], [0, 0, 0, 0, 0, 0, 0, 0, 1, 0]], 
                dtype=float)

import numpy

x = numpy.array([[0, 1, 2, 3, 4, 5, 6, 7, 8, 9], [0, 0, 0, 0, 0, 0, 0, 0, 1, 0]], 
                dtype=float)
cutoff = [5, 7]
for i in range(len(x)):
    x[i][0:cutoff[i]:1] = numpy.nan
 print(x)

yields

array([[ nan,  nan,  nan,  nan,  nan,   5.,   6.,   7.,   8.,   9.],
       [ nan,  nan,  nan,  nan,  nan,  nan,  nan,   0.,   1.,   0.]])

answered May 3, 2015 at 21:36

unutbu

886k197 gold badges1.9k silver badges1.7k bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Community · Accepted Answer · 2017-05-23 11:46:31Z

Vectorized approach to set appropriate elements as NaNs

@unutbu's solution must get rid of the value error you were getting. If you are looking to vectorize for performance, you can use boolean indexing like so -

import numpy as np

# Create mask of positions in x (with float datatype) where NaNs are to be put
mask = np.asarray(cutoff)[:,None] > np.arange(x.shape[1])

# Put NaNs into masked region of x for the desired ouput
x[mask] = np.nan

Sample run -

In [92]: x = np.random.randint(0,9,(4,7)).astype(float)

In [93]: x
Out[93]: 
array([[ 2.,  1.,  5.,  2.,  5.,  2.,  1.],
       [ 2.,  5.,  7.,  1.,  5.,  4.,  8.],
       [ 1.,  1.,  7.,  4.,  8.,  3.,  1.],
       [ 5.,  8.,  7.,  5.,  0.,  2.,  1.]])

In [94]: cutoff = [5,3,0,6]

In [95]: x[np.asarray(cutoff)[:,None] > np.arange(x.shape[1])] = np.nan

In [96]: x
Out[96]: 
array([[ nan,  nan,  nan,  nan,  nan,   2.,   1.],
       [ nan,  nan,  nan,   1.,   5.,   4.,   8.],
       [  1.,   1.,   7.,   4.,   8.,   3.,   1.],
       [ nan,  nan,  nan,  nan,  nan,  nan,   1.]])

Vectorized approach to directly calculate row-wise mean of appropriate elements

If you were trying to get the masked mean values, you can modify the earlier proposed vectorized approach to avoid dealing with NaNs altogether and more importantly keep x with integer values. Here's the modified approach -

# Get array version of cutoff
cutoff_arr = np.asarray(cutoff)

# Mask of positions in x which are to be considered for row-wise mean calculations
mask1 = cutoff_arr[:,None] <= np.arange(x.shape[1])

# Mask x, calculate the corresponding sum and thus mean values for each row
masked_mean_vals = (mask1*x).sum(1)/(x.shape[1] -  cutoff_arr)

Here's a sample run for such a solution -

In [61]: x = np.random.randint(0,9,(4,7))

In [62]: x
Out[62]: 
array([[5, 0, 1, 2, 4, 2, 0],
       [3, 2, 0, 7, 5, 0, 2],
       [7, 2, 2, 3, 3, 2, 3],
       [4, 1, 2, 1, 4, 6, 8]])

In [63]: cutoff = [5,3,0,6]

In [64]: cutoff_arr = np.asarray(cutoff)

In [65]: mask1 = cutoff_arr[:,None] <= np.arange(x.shape[1])

In [66]: mask1
Out[66]: 
array([[False, False, False, False, False,  True,  True],
       [False, False, False,  True,  True,  True,  True],
       [ True,  True,  True,  True,  True,  True,  True],
       [False, False, False, False, False, False,  True]], dtype=bool)

In [67]: masked_mean_vals = (mask1*x).sum(1)/(x.shape[1] -  cutoff_arr)

In [68]: masked_mean_vals
Out[68]: array([ 1.        ,  3.5       ,  3.14285714,  8.        ])

Collectives™ on Stack Overflow

Set values in numpy array to NaN by index

2 Answers 2

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related