0

I tried to find in stackoverflow a thread answering this question, but I could not find. Thus, if it is duplicate, please provide the link.
The use case is very common:
I have two arrays: X which contains two dimensional datapoints and y which contains labels either 0 or 1.
X has shape (307, 2)
y has shape (307, 1)
I want to have all rows in X where the corresponding row in y has value of 1.
I tried the following code:
X[y==1]
But it raises the following error:

IndexError: boolean index did not match indexed array along dimension 1; dimension is 2 but corresponding boolean dimension is 1

How can I do that?

4
  • You could try X[y, :] Commented Aug 21, 2019 at 12:17
  • @MadPhysicist This gives a totally different array -> shape = (307, 1, 2). This is not what I am looking for. Just the rows where the corresponding rows in y have a value of 1 -> shape = (9, 2) Commented Aug 21, 2019 at 12:27
  • @MadPhysicist And y is not an array of boolean values as described in the question. Thus to mask, you have to write a condition which then results in the IndexError I also mentioned in the question and have found the reason which is stated in the answer to this question Commented Aug 21, 2019 at 12:44
  • X[y.ravel().astype(np.bool), :] Commented Aug 21, 2019 at 12:57

1 Answer 1

0

I have found the following way:

X[np.where(np.any(y==1, axis=1))]

I also found out that the reason for the above error is that y has two dimensions. The following code will work, too, and uses masking which has a better performance:

y = y.reshape(-1)
X[y==1,:]
Sign up to request clarification or add additional context in comments.

1 Comment

np.any(y==1, axis=1) is equivalent to y.ravel(). The call to where makes this fancy indexing, which is much less efficient than just using the mask.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.