5

So I've got this numpy array of shape (31641600,2), which has some, if not many zero values in it.

Let's call the array X.

Doing:

print len(X)
>>> 31641600

But then doing:

X = X[np.nonzero(X)]
print len(X)
>>> 31919809

Don't understand why the second one is bigger. On the Documentation it says that applying the above method should return only the non-zero values, hence the length of X should be smaller.

Any ideas? Thank you.

1 Answer 1

5

This may be due to the fact that len(X) only returns X's length along the first axis. When you do

X = X[np.nonzero(X)]

you get a 1D array, so if you had less than 50% of zeros in X, len(X) will increase.

Consider:

In [1]: import numpy as np

In [2]: X = np.zeros((42, 2))

In [3]: X[:, 0] = 1

In [4]: X[0, 1] = 1

In [5]: len(X)
Out[5]: 42

In [6]: len(X[np.nonzero(X)])
Out[6]: 43

That's because X[np.nonzero(X)] is an array of 43 one's:

In [7]: X[np.nonzero(X)].shape
Out[7]: (43,)

Update in response to comment: if in fact you want all pairs where the first element is non-zero, you can do:

X = X[ X[:, 0] != 0 ]
Sign up to request clarification or add additional context in comments.

3 Comments

Haha nice one! Didn't know about this! What I was trying to do basically is get rid of all nonzero entries in my array. What my (x,y) values represent are magnitudes and angles from some motion vectors, just need to discard all 0 magnitudes, along with them the associated angles.
@ClaudiuS Then you can use fancy indexing instead of nonzero. I updated the answer.
I can only then accept your answer and have my thanks :) I was looking into boolean indexing but was doing it wrong. Thanks again

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.