1

I'm doing comparisons (equality) of some series which have some NaN elements and numeric elements. I'd like every comparison involving a NaN to return NaN instead of False - what's the best Numpy function to do this?

df = pd.DataFrame({'a': [np.NaN, np.NaN, 1], 'b': [np.NaN, 1, 1]})

df['a'] == df['b']

gives

0    False
1    False
2     True
dtype: bool

when I'd like it to return

0    NaN
1    NaN
2    1
dtype: float

or

0    NaN
1    NaN
2    True
dtype: bool
6
  • 2
    Can you give a specific example? (minimal reproducible example). There are probably many ways to do it depending on the exact context. Commented Jan 13 at 18:21
  • ^ What mozway said. For some specific tips, see How to make good reproducible pandas examples. Commented Jan 13 at 18:30
  • Added, sorry, thought my description was clear enough at first. Thanks! Commented Jan 13 at 18:34
  • 3
    NaN is not a valid value for an array of dtype bool or int. Both of your options would be impossible as they require a dtype of object or float Commented Jan 13 at 18:46
  • Ok fine I'll change my requirement then. I don't actually care about the dtype, as long as there is a NaN and a Falsy value and a Truthy value. Commented Jan 13 at 18:49

4 Answers 4

2

One way is to use a mask to check where your NaN values are postprocess your result:

result = df['a'] == df['b']
print(result)

# Check where you NaN values are and set them to NaN afterwards
nan_mask = df["a"].isna() | df["b"].isna()

result[nan_mask] = float("nan")
print(result)

# 0    NaN
# 1    NaN
# 2    1.0
# dtype: float64

Note: You cannot have a dtype of int or bool if you want to have NaN values.

Sign up to request clarification or add additional context in comments.

Comments

2

Pandas has extension dtypes that support three-valued logic for integers and for floats.

  • You can use them on-demand:

    df.astype('Float64').pipe(lambda d: d['a'] == d['b'])
    
    0    <NA>
    1    <NA>
    2    True
    dtype: boolean
    
  • Or on df creation:

    df = pd.DataFrame(
        {'a': [np.NaN, np.NaN, 1], 'b': [np.NaN, 1, 1]},
        dtype='Int64')
    #       a     b
    # 0  <NA>  <NA>
    # 1  <NA>     1
    # 2     1     1
    
    df['a'] == df['b']
    
    0    <NA>
    1    <NA>
    2    True
    dtype: boolean
    

See also: Nullable integer data type (User Guide)


There's also a nullable boolean, which works the same in this particular case.

df.astype('boolean').pipe(lambda d: d['a'] == d['b'])
0    <NA>
1    <NA>
2    True
dtype: boolean

See also: Nullable Boolean data type (User Guide)

Comments

1

You can use numpy.where and pandas.isna to replace the comparisons involving NaN with NaN:

  • pd.isna checks if any of the elements in the columns 'a' or 'b' are NaN.
  • numpy.where allows you to replace values based on a condition. If either element is NaN, it replaces the comparison result with NaN; otherwise, it performs the equality check.

Here's the fixed code:

import numpy as np
import pandas as pd

df = pd.DataFrame(columns=['a', 'b'], index=[0, 1, 2], data={'a': [np.NaN, np.NaN, 1], 'b': [np.NaN, 1, 1]}) 

# Compare the columns with numpy.where and pandas.isna
# checks if any of the elements in the columns 'a' or 'b' are NaN.
comparison = np.where(pd.isna(df['a']) | pd.isna(df['b']), np.NaN, df['a'] == df['b'])

# Convert the result to a Series in order to have your excepted output
result = pd.Series(comparison, index=df.index)

print(result)

As output, you get:

0    NaN
1    NaN
2    1.0
dtype: float64

Comments

0

If the dtype becoming float is not a concern, then np.ma might be useful for working with this:

(
    np.ma.masked_invalid(df['a']) == np.ma.masked_invalid(df['b'])
).astype(float).filled(np.nan)

This masks nan in the comparison, then replaces masked values back with nan.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.