How to make Numpy comparisons involving NaN to return NaN instead of False?

Question

I'm doing comparisons (equality) of some series which have some NaN elements and numeric elements. I'd like every comparison involving a NaN to return NaN instead of False - what's the best Numpy function to do this?

df = pd.DataFrame({'a': [np.NaN, np.NaN, 1], 'b': [np.NaN, 1, 1]})

df['a'] == df['b']

gives

0    False
1    False
2     True
dtype: bool

when I'd like it to return

0    NaN
1    NaN
2    1
dtype: float

or

0    NaN
1    NaN
2    True
dtype: bool

Can you give a specific example? (minimal reproducible example). There are probably many ways to do it depending on the exact context. — mozway
– mozway, Commented Jan 13 at 18:21
^ What mozway said. For some specific tips, see How to make good reproducible pandas examples. — wjandrea
– wjandrea, Commented Jan 13 at 18:30
Added, sorry, thought my description was clear enough at first. Thanks! — Faraz Masroor
– Faraz Masroor, Commented Jan 13 at 18:34
NaN is not a valid value for an array of dtype bool or int. Both of your options would be impossible as they require a dtype of object or float — M. Zhang
– M. Zhang, Commented Jan 13 at 18:46
Ok fine I'll change my requirement then. I don't actually care about the dtype, as long as there is a NaN and a Falsy value and a Truthy value. — Faraz Masroor
– Faraz Masroor, Commented Jan 13 at 18:49

wjandrea · Accepted Answer · 2025-01-13 18:59:07Z

2

One way is to use a mask to check where your NaN values are postprocess your result:

result = df['a'] == df['b']
print(result)

# Check where you NaN values are and set them to NaN afterwards
nan_mask = df["a"].isna() | df["b"].isna()

result[nan_mask] = float("nan")
print(result)

# 0    NaN
# 1    NaN
# 2    1.0
# dtype: float64

Note: You cannot have a dtype of int or bool if you want to have NaN values.

edited Jan 13 at 18:59

wjandrea

33.8k10 gold badges69 silver badges105 bronze badges

answered Jan 13 at 18:53

Daraan

5,1267 gold badges24 silver badges51 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

wjandrea · Accepted Answer · 2025-01-13 19:34:57Z

2

Pandas has extension dtypes that support three-valued logic for integers and for floats.

You can use them on-demand:

df.astype('Float64').pipe(lambda d: d['a'] == d['b'])

0    <NA>
1    <NA>
2    True
dtype: boolean

Or on df creation:

df = pd.DataFrame(
    {'a': [np.NaN, np.NaN, 1], 'b': [np.NaN, 1, 1]},
    dtype='Int64')
#       a     b
# 0  <NA>  <NA>
# 1  <NA>     1
# 2     1     1

df['a'] == df['b']

0    <NA>
1    <NA>
2    True
dtype: boolean

See also: Nullable integer data type (User Guide)

There's also a nullable boolean, which works the same in this particular case.

df.astype('boolean').pipe(lambda d: d['a'] == d['b'])

0    <NA>
1    <NA>
2    True
dtype: boolean

See also: Nullable Boolean data type (User Guide)

edited Jan 13 at 19:34

answered Jan 13 at 19:12

wjandrea

33.8k10 gold badges69 silver badges105 bronze badges

Comments

Sithila Sihan Somaratne · Accepted Answer · 2025-01-13 19:01:14Z

You can use numpy.where and pandas.isna to replace the comparisons involving NaN with NaN:

pd.isna checks if any of the elements in the columns 'a' or 'b' are NaN.
numpy.where allows you to replace values based on a condition. If either element is NaN, it replaces the comparison result with NaN; otherwise, it performs the equality check.

Here's the fixed code:

import numpy as np
import pandas as pd

df = pd.DataFrame(columns=['a', 'b'], index=[0, 1, 2], data={'a': [np.NaN, np.NaN, 1], 'b': [np.NaN, 1, 1]}) 

# Compare the columns with numpy.where and pandas.isna
# checks if any of the elements in the columns 'a' or 'b' are NaN.
comparison = np.where(pd.isna(df['a']) | pd.isna(df['b']), np.NaN, df['a'] == df['b'])

# Convert the result to a Series in order to have your excepted output
result = pd.Series(comparison, index=df.index)

print(result)

As output, you get:

0    NaN
1    NaN
2    1.0
dtype: float64

wjandrea · Accepted Answer · 2025-01-13 19:24:42Z

0

If the dtype becoming float is not a concern, then np.ma might be useful for working with this:

(
    np.ma.masked_invalid(df['a']) == np.ma.masked_invalid(df['b'])
).astype(float).filled(np.nan)

This masks nan in the comparison, then replaces masked values back with nan.

edited Jan 13 at 19:24

wjandrea

33.8k10 gold badges69 silver badges105 bronze badges

answered Jan 13 at 18:51

M. Zhang

93711 silver badges22 bronze badges

Collectives™ on Stack Overflow

How to make Numpy comparisons involving NaN to return NaN instead of False?

4 Answers 4

Comments

Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Comments

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related