Fancy Indexing with Boolean Masking | Numpy in Python

Question

I came across this piece of code in the Python Data Science Handbook, have modified it here for readability. It is quite puzzling for me, given it combines fancy indexing with masking and I am unable to understand what is happening underneath.

import numpy as np
X = np.arange(12).reshape(3,4)
print("---X----\n",X)
row = np.array([0,1,2])
mask = np.array([1, 0, 1, 0], dtype=bool)
print("\n-----row vector after reshaping ----\n",row[:, np.newaxis])
print("\n ---mask  ----\n",mask)
print("\n ----result-----\n",X[row[:, np.newaxis], mask])

Here is the output:

---X----
 [[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]]

-----row vector after reshaping ----
 [[0]
 [1]
 [2]]

 ---mask  ----
 [ True False  True False]

 ----result-----
 [[ 0  2]
 [ 4  6]
 [ 8 10]]

I have an understanding that at times of

X[row[:,np.newaxis],[1,2,3]]

broadcasting kicks in because shape of first argument is (3,1) and second argument is (3,). It broadcasts both the arguments to (3,3) and then fancy indexing selects the respective positioned elements and the resultant size is that of the arguments(which is what docs of fancy indexing say).

But the code which I have posted earlier, that perplexes me. From what I can infer, second argument(mask) is equivalent to [1,0,1,0] of shape(4,) and the first argument would be

[[0],
[1],
[2]
]

of shape (3,1). In such case, both these argument should be broadcasted to (3,4) and then the elements would be picked giving resultant matrix of size - (3,4). Yes, I understand it defeats the purpose of Boolean masking concept but we are not doing something like X[mask] where we get the respective values where mask is True. In our statement of X[row[:, np.newaxis], mask]), first argument is an integer array and second is Boolean array. Doesn't the Boolean array get converted to integer to play along with the first argument or is it that the Boolean array first makes selection of columns which comes out to be:

[[0 2],
[4,6],
8,10]
]

and on this we apply the first argument.

Paul Panzer · Accepted Answer · 2019-06-15 13:39:56Z

1

You are on the right track, the boolean indeed gets converted (or may at least be thought of as getting converted) to an index. Maybe, it's the details of this conversion that confuse you?

Here is the relevant bit from the docs

In general if an index includes a Boolean array, the result will be identical to inserting obj.nonzero() into the same position and using the integer array indexing mechanism described above. x[ind_1, boolean_array, ind_2] is equivalent to x[(ind_1,) + boolean_array.nonzero() + (ind_2,)].

Now let's simply apply that to your example:

mask.nonzero()
# (array([0, 2]),)

So,

(row[:, None],) + mask.nonzero()
# (array([[0],
#         [1],
#         [2]]), array([0, 2]))

is the effective index. This broadcasts to 3x2 and all is as expected.

answered Jun 15, 2019 at 13:39

Paul Panzer

53.3k3 gold badges59 silver badges103 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Fancy Indexing with Boolean Masking | Numpy in Python

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related