Why doesn't numpy.unique recognize that multiple numpy.nan values are identical?

Question

I have a short example script:

import numpy as np

print('numpy version:       ', np.version.version)

foo = np.full(10, 5)
bar = np.full(10, np.nan)

print('foo:                 ', foo)
print('Unique values of foo:', np.unique(foo))

print('bar:                 ', bar)
print('Unique values of bar:', np.unique(bar))

It prints the following result:

numpy version:        1.16.4
foo:                  [5 5 5 5 5 5 5 5 5 5]
Unique values of foo: [5]
bar:                  [nan nan nan nan nan nan nan nan nan nan]
Unique values of bar: [nan nan nan nan nan nan nan nan nan nan]

My questions:

Why doesn't np.unique() return just a single nan value when it receives bar as input? Surely this is an error, right? Or if it's the correct, expected behavior, then why is it correct?
What is the recommended workaround--if any--for obtaining the more typical behavior as illustrated by foo?

Mark · Accepted Answer · 2019-07-17 03:28:35Z

7

To answer your question why: the IEEE spec (IEEE 754) for floating point numbers, which is how numpy defines NaN is not equal to anything including itself. Numpy is respecting this, which is why np.nan == np.nan is false.

People complain about this, but it's a hard choice to make because NaN can arise from things that are not equal. For example, should this expression be true?

np.sqrt(-1) == np.sqrt(-2)

Both evaluate to NaN, but saying the above should be true seems very wrong. You need to decide how to handle NaN in your code—if your want to treat them all the same way, you certainly can.

edited Jul 17, 2019 at 3:28

answered Jul 17, 2019 at 3:21

Mark

92.7k8 gold badges116 silver badges156 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

U13-Forward · Accepted Answer · 2019-07-17 03:21:12Z

2

First question:

As you can see:

>>> np.nan == np.nan
False
>>>

np.nans don't equal each other.

Second question:

It can't be pretty, only way that I can think of is:

>>> a = np.unique(np.where(np.isnan(bar), 0, bar))
>>> np.where(a == 0, np.nan, a)
array([ nan])
>>>

edited Jul 17, 2019 at 3:21

answered Jul 17, 2019 at 3:15

U13-Forward

71.8k15 gold badges100 silver badges125 bronze badges

2 Comments

stachyra Over a year ago

I like your workaround, although of course it seems like it would get a slightly incorrect result if bar included any 0 values in addition to the nans. But at least it could be made to work in most situations.

U13-Forward Over a year ago

@stachyra Yeah NaNs are not easy to deal with

Paul Panzer · Accepted Answer · 2019-07-17 04:57:24Z

I'm not sure I can fully recommend it, but if sort order of unique elements doesn't matter:

# make example with nans                                                                                                    
x = np.arange(15)%5-2                                                                                       
y = x.astype(bool)/x                                                       
y                                                                                                           
# array([-0.5, -1. ,  nan,  1. ,  0.5, -0.5, -1. ,  nan,  1. ,  0.5, -0.5,                                        
#        -1. ,  nan,  1. ,  0.5])

# trick comes here                                                                                 
np.unique(y.view(int)).view(float)
# array([-0.5, -1. ,  nan,  0.5,  1. ])

Be warned, though, that this map between int and float is not 100% one-to-one, for example:

(np.array(np.nan).view(int)+1).view(float)
# nan

This last nan would compare not equal to a standard nan even when we applied the cast-to-int-and-back trick.

Collectives™ on Stack Overflow

Why doesn't numpy.unique recognize that multiple numpy.nan values are identical?

3 Answers 3

Comments

2 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related