0

I get a PEP8 complaint about numpy.where(mask == False) where mask is a boolean array. The PEP8 recommendation comparison should be either 'if condition is false' or 'if not condition'. What is the pythonic syntax for the suggested comparison inside numpy.where()?

4
  • 1
    What PEP8 tester are you using? PEP8 is a general Python style recommendation. It's not adapted to numpy. Your expression looks perfectly fine to me. Commented Apr 7, 2017 at 23:28
  • 1
    mask==False is the same as ~mask, but quite different from mask is False or not mask. Commented Apr 7, 2017 at 23:35
  • @hpaulj I use pycharm and its native code inspection is I believe using pep8 v '1.7.0' Commented Apr 11, 2017 at 23:10
  • github.com/PyCQA/pycodestyle/issues/450 is a github issue about this problem; and basically the same information as here. In numpy this == is valid, and cannot be replaced with a pep8 compliant form. So ignore the complaint. Commented Apr 12, 2017 at 2:31

1 Answer 1

1

Negating a boolean mask array in NumPy is ~mask.

Also, consider whether you actually need where at all. Seemingly the most common use is some_array[np.where(some_mask)], but that's just an unnecessarily wordy and inefficient way to write some_array[some_mask].

Sign up to request clarification or add additional context in comments.

5 Comments

Boolean indexing takes the same amount of time as the where version. I think that means there's an implicit where. docs.scipy.org/doc/numpy/reference/…
@hpaulj: IIRC, for more complex cases, NumPy does call nonzero, but for simple cases, it bypasses that and uses the boolean mask directly.
@hpaulj: See the array_boolean_subscript code in numpy/core/src/multiarray/mapping.c. The timings I'm getting aren't what I expected, though. On some inputs, where is actually faster!
Digging into the source, I think it's because PyArray_Nonzero and the 1D integer array index case are optimized to not use a NpyIter for the 1D case, while that optimization never made it into array_boolean_subscript. For 1D arrays and non-small proportions of True in the mask, the NpyIter overhead overwhelms the advantage of not needing to create an integer array...
while for 2D and up, or for small proportions of True in the mask, where loses due to the overhead of creating integer index arrays. Now I want to write up a pull request. arr[where(mask)] should not be outperforming arr[mask].

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.