2

So, I was working on a pipeline, and I stumbled upon this error when fitting it:

Traceback (most recent call last):
  File "C:/Users/Shawn/Documents/temp/bool_issue.py", line 7, in <module>
    _assert_all_finite(array, False)
  File "C:\Users\Shawn\AppData\Local\Programs\Python\Python38\lib\site-packages\sklearn\utils\validation.py", line 103, in _assert_all_finite
    if _object_dtype_isnan(X).any():
AttributeError: 'bool' object has no attribute 'any'

This is actually a some custom code to test the issue, see below

Following the traceback, I see that _object_dtype_isnan() takes a numpy array, and returns another numpy array, in the form of a boolean mask (an array of booleans).
However, for some reason, it sometimes returns a boolean directly instead.

Code to reproduce the error:

import numpy as np
import pandas as pd
from sklearn.utils.validation import _assert_all_finite

bad_array = np.array(['F', 'F', 'M', 'F', 'M', pd.NA, 'F', 'M'], dtype='object')

_assert_all_finite(bad_array, False)  # Raises AttributeError

2 Answers 2

6

After further investigation, I found out that was because some pd.NA got in my dataset.
Replacing them with None works just fine !

# For my original pandas DataFrame
X.replace(to_replace=pd.NA, value=None, inplace=True)

From my understanding (I didn't check, just guessing), numpy won't try to do the elementwise comparison because there are external objects inside, so instead it will perform a whole comparison of the array.
Also, working around the mask with pd.NA seems to be a mess:

>>> array = np.array(['F', 'F', 'M', 'F', 'M', pd.NA, 'F', 'M'], dtype='object')
>>> mask = np.equal(array, np.array(['F', ] * len(array)))
Traceback (most recent call last):
  File "C:/Users/Shawn/Documents/temp/bool_issue.py", line 7, in <module>
    mask = np.equal(array, np.array(['F', ] * len(array)))
  File "pandas\_libs\missing.pyx", line 360, in pandas._libs.missing.NAType.__bool__
TypeError: boolean value of NA is ambiguous

Therefore, if you have a trick for replacing them in a numpy array, please share !

While this is issue is not directly linked to scikit-learn, but rather the way numpy works, that's how I found it, so I'll tag it anyway :shrug: :)

Sign up to request clarification or add additional context in comments.

Comments

3

So, I had the same exact error once again recently, but this time, it didn't work with None.

Long story short, updating scikit-learn from 0.23.2 to 0.24.2 solved the issue :)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.