I came across this piece of code in the Python Data Science Handbook, have modified it here for readability. It is quite puzzling for me, given it combines fancy indexing with masking and I am unable to understand what is happening underneath.
import numpy as np
X = np.arange(12).reshape(3,4)
print("---X----\n",X)
row = np.array([0,1,2])
mask = np.array([1, 0, 1, 0], dtype=bool)
print("\n-----row vector after reshaping ----\n",row[:, np.newaxis])
print("\n ---mask ----\n",mask)
print("\n ----result-----\n",X[row[:, np.newaxis], mask])
Here is the output:
---X----
[[ 0 1 2 3]
[ 4 5 6 7]
[ 8 9 10 11]]
-----row vector after reshaping ----
[[0]
[1]
[2]]
---mask ----
[ True False True False]
----result-----
[[ 0 2]
[ 4 6]
[ 8 10]]
I have an understanding that at times of
X[row[:,np.newaxis],[1,2,3]]
broadcasting kicks in because shape of first argument is (3,1) and second argument is (3,). It broadcasts both the arguments to (3,3) and then fancy indexing selects the respective positioned elements and the resultant size is that of the arguments(which is what docs of fancy indexing say).
But the code which I have posted earlier, that perplexes me. From what I can infer, second argument(mask) is equivalent to [1,0,1,0] of shape(4,) and the first argument would be
[[0],
[1],
[2]
]
of shape (3,1).
In such case, both these argument should be broadcasted to
(3,4) and then the elements would be picked giving resultant matrix of size - (3,4). Yes, I understand it defeats the purpose of Boolean masking concept but we are not doing something like X[mask] where we get the respective values where mask is True.
In our statement of X[row[:, np.newaxis], mask]),
first argument is an integer array and second is Boolean array. Doesn't the Boolean array get converted to integer to play along with the first argument or is it that the Boolean array first makes selection of columns which comes out to be:
[[0 2],
[4,6],
8,10]
]
and on this we apply the first argument.