Skip to content

Conversation

@parthkandharkar
Copy link

This pull request fixes an inconsistency in how BooleanArray handles logical operations when the right-hand side is a list-like object containing pd.NA. The root cause is in the implementation of _logical_method, where list-like operands are coerced with np.asarray(other, dtype="bool"). When other contains pd.NA, NumPy attempts to evaluate bool(pd.NA), which always raises TypeError. This prevents pandas from constructing the appropriate boolean data array and mask that represent missing values in a BooleanArray.

The fix replaces the unsafe boolean casting with a safe conversion that preserves pd.NA values. Instead of forcing dtype="bool", the code now converts the input to an object array and passes it through coerce_to_array, which correctly constructs both the underlying boolean data and the mask that identifies missing entries. This allows pandas to evaluate expressions like b & [pd.NA, False] using its normal element-wise logic and return the correct result.

Copy link
Member

@rhshadrach rhshadrach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR!

Comment on lines 397 to 403
if isinstance(other, BooleanArray):
other, mask = other._data, other._mask
elif is_list_like(other):
other = np.asarray(other, dtype="bool")
other = np.asarray(other, dtype=object)
if other.ndim > 1:
return NotImplemented
other, mask = coerce_to_array(other, copy=False)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

object comes with performance penalties, I'm thinking we should use a Boolean array here, e.g.

other = BooleanArray._from_sequence(other)
other, mask = other._data, other._mask

cc @jbrockmendel

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think that risks casting non-bool items to bools. pd.array might be safer?

But I'm not sure we want to support this. For non-EA cases we made a choice years ago to raise on &, |, ^ when given a dtype-less sequence

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm +1 on aligning with Series here and deprecating this case entirely.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

BUG: Inconsistency of BooleanArray.__and__ with pd.NA and [pd.NA]

3 participants