3

I have a short example script:

import numpy as np

print('numpy version:       ', np.version.version)

foo = np.full(10, 5)
bar = np.full(10, np.nan)

print('foo:                 ', foo)
print('Unique values of foo:', np.unique(foo))

print('bar:                 ', bar)
print('Unique values of bar:', np.unique(bar))

It prints the following result:

numpy version:        1.16.4
foo:                  [5 5 5 5 5 5 5 5 5 5]
Unique values of foo: [5]
bar:                  [nan nan nan nan nan nan nan nan nan nan]
Unique values of bar: [nan nan nan nan nan nan nan nan nan nan]

My questions:

  1. Why doesn't np.unique() return just a single nan value when it receives bar as input? Surely this is an error, right? Or if it's the correct, expected behavior, then why is it correct?
  2. What is the recommended workaround--if any--for obtaining the more typical behavior as illustrated by foo?

3 Answers 3

7

To answer your question why: the IEEE spec (IEEE 754) for floating point numbers, which is how numpy defines NaN is not equal to anything including itself. Numpy is respecting this, which is why np.nan == np.nan is false.

People complain about this, but it's a hard choice to make because NaN can arise from things that are not equal. For example, should this expression be true?

np.sqrt(-1) == np.sqrt(-2) 

Both evaluate to NaN, but saying the above should be true seems very wrong. You need to decide how to handle NaN in your code—if your want to treat them all the same way, you certainly can.

Sign up to request clarification or add additional context in comments.

Comments

2

First question:

As you can see:

>>> np.nan == np.nan
False
>>>

np.nans don't equal each other.

Second question:

It can't be pretty, only way that I can think of is:

>>> a = np.unique(np.where(np.isnan(bar), 0, bar))
>>> np.where(a == 0, np.nan, a)
array([ nan])
>>> 

2 Comments

I like your workaround, although of course it seems like it would get a slightly incorrect result if bar included any 0 values in addition to the nans. But at least it could be made to work in most situations.
@stachyra Yeah NaNs are not easy to deal with
0

I'm not sure I can fully recommend it, but if sort order of unique elements doesn't matter:

# make example with nans                                                                                                    
x = np.arange(15)%5-2                                                                                       
y = x.astype(bool)/x                                                       
y                                                                                                           
# array([-0.5, -1. ,  nan,  1. ,  0.5, -0.5, -1. ,  nan,  1. ,  0.5, -0.5,                                        
#        -1. ,  nan,  1. ,  0.5])

# trick comes here                                                                                 
np.unique(y.view(int)).view(float)
# array([-0.5, -1. ,  nan,  0.5,  1. ])                                                                           

Be warned, though, that this map between int and float is not 100% one-to-one, for example:

(np.array(np.nan).view(int)+1).view(float)
# nan                                                                                                             

This last nan would compare not equal to a standard nan even when we applied the cast-to-int-and-back trick.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.