0

Is there any way to make this work? Where the array I'm working on consist of None, which means to ignore that value in the processing. For example, I would like to normalize this array:

output = np.array([[1,2,None,4,5],[None,7,8,9,10]])
mu = np.mean(output, axis=(0,1), keepdims=True)
sd = np.std(output, axis=(0,1), keepdims=True)
normalized_output = (output - mu)/sd

Expected outcome:

array([[-1.5666989 , -1.21854359, None, -0.52223297, -0.17407766],
       [ None,  0.52223297,  0.87038828,  1.21854359,  1.5666989 ]])

Edit: As suggested, it is better to use NaN instead of None. How to get this to work with NaN:

output = np.array([[1,2,np.NAN,4,5],[np.NAN,7,8,9,10]])
mu = np.mean(output, axis=(0,1), keepdims=True)
sd = np.std(output, axis=(0,1), keepdims=True)
normalized_output = (output - mu)/sd
print(normalized_output)
# array([[nan, nan, nan, nan, nan],
#        [nan, nan, nan, nan, nan]])
5
  • If you have None in your vector, this is a very bad sign: it means the values in the array are of type object and so that all related computations are not optimized. Consider using NaN values that are native ones. Commented Apr 25, 2021 at 15:00
  • Thanks for your input. I didnt know None is bad for vector. I can use where to change it to NaN. Updated the question with your suggestion. Commented Apr 25, 2021 at 15:22
  • If you want to keep your values integers, use a masked array instead of NaN. I regard NaN as the result of a bad computation (0/0 for example), while a masked value indicates the absence of the value: two different things. NaN is often used for both, but that can lead to confusion. Commented Apr 25, 2021 at 15:31
  • NaNs are also taken into account when calculating, for example, a mean value. There are special nanmean functions, but here, I think a masked array is more appropriate. Commented Apr 25, 2021 at 15:32
  • Does this answer your question? NumPy: calculate averages with NaNs removed Commented Apr 25, 2021 at 15:38

2 Answers 2

1

You can do calculation that skip over certain values by using numpy masked arrays.

A function already exists to create a masked array that masks NaN values: ma.masked_invalid.

It can be used like so:

import numpy as np
from numpy import ma


output = ma.masked_invalid([[1,2,np.NAN,4,5],[np.NAN,7,8,9,10]])

mu = np.mean(output, axis=(0,1), keepdims=True)
sd = np.std(output, axis=(0,1), keepdims=True)
normalized_output = (output - mu)/sd
print(normalized_output)

Output (-- represents an invalid value):

[[-1.5461980716652028 -1.2206826881567392 -- -0.5696519211398116
  -0.24413653763134782]
 [-- 0.40689422938557973 0.7324096128940435 1.0579249964025073
  1.3834403799109711]]
Sign up to request clarification or add additional context in comments.

Comments

0

You can use np.nanstd and np.nanmean function instead of np.std and np.mean

output = np.array([[1,2,np.nan,4,5],[np.nan,7,8,9,10]])
mu = np.nanmean(output, axis=(0,1), keepdims=True)
sd = np.nanstd(output, axis=(0,1), keepdims=True)
normalized_output = (output - mu)/sd

you will get output like this

array([[-1.54619807, -1.22068269,         nan, -0.56965192, -0.24413654],
      [        nan,  0.40689423,  0.73240961,  1.057925  ,  1.38344038]])

It is different from your desired output because np.nanstd ignore Nan values present in array so you have 8 elements instead of 10.

1 Comment

Note that this changes the dtype of output from int64 to float64.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.