how to return number of certain values in an array using numpy

Question

I need to return the number of non-reasonable (nan or out of range) values for the 3rd column where has 0s an a blank in it. I have to deal with a csv file in a real problem but I just created a ndarray for now.

data = np.array([[   1, 2000,  143, 4546], [   2, 1999,  246,    0], [   3, 2008,  190,    ], [   4, 2000,  100,    0]])

I cant even think where I should start.

It will be awesome if someone can help.

Can you be more specific about what you need? What's the output you want from this for instance? — akuiper
– akuiper, Commented Sep 20, 2017 at 14:39
that's what I'm trying to understand too.. that's just how the question was worded. So I guess I just have to return how many cells in the csv file is either blank, nan or 0. — Yun Tae Hwang
– Yun Tae Hwang, Commented Sep 20, 2017 at 14:42
should be data = np.array([[ 1, 2000, 143, 4546], [ 2, 1999, 246, 0], [ 3, 2008, 190, np.NAN ], [ 4, 2000, 100, 0]]) — RagingRoosevelt
– RagingRoosevelt, Commented Sep 20, 2017 at 14:47
but some of the real data do have some blanks like that. no? — Yun Tae Hwang
– Yun Tae Hwang, Commented Sep 20, 2017 at 14:50
If you tell numpy that it's a numerical array, it can't then store a blank since that implies "" which is a string. The blank would have to be interpreted as a NaN. — RagingRoosevelt
– RagingRoosevelt, Commented Sep 20, 2017 at 15:07

RagingRoosevelt · Accepted Answer · 2017-09-20 15:09:33Z

1

First, you need to be able to access just the column that you're interested in. Do this with a slice:

data[:,2] # grab all rows, and just the column with index 2

Now you want to count the occurrences that are NaN:

np.count_nonzero(np.isnan(data[:,2]))

And we want to count the number of zero elements:

data[:,2].size - np.count_nonzero(data[:,2])

And if we add those together:

data[:,2].size - np.count_nonzero(data[:,2]) + np.count_nonzero(np.isnan(data[:,2]))

This is boring, though, since the 3rd column doesn't have any 0 or NaN in it. Lets try with the last column:

>>> slice = data[:,3]
>>> slice.size - np.count_nonzero(slice) + np.count_nonzero(np.isnan(slice))
3

edit I should explain why this works:

np.isnan(data[:,2]) gives an array of True and False based on if it's a NaN or not. True, when treated as a number, is converted to 1 and False is converted to0so thenp.count_nonzerocall counts the number of1which represent theNaN` values.

np.count_nonzero(data[:,2]) counts the number of non-zero directly. If we subtract the number of non-zero elements from the total number of elements, we'll get the number of 0s.

edited Sep 20, 2017 at 15:09

answered Sep 20, 2017 at 14:53

RagingRoosevelt

2,16423 silver badges35 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Yun Tae Hwang Over a year ago

hey thank you so much. I have tried this code on my real csv data. and I tested on the several column that has 0s in it. but I am getting one more count than the actual count. so if there are 54 0s in one column then I get 55 somehow.

RagingRoosevelt Over a year ago

Try running the the three parts by themselves to see which one is giving the wrong number. So slice.size, then np.count_nonzero(slice) then np.count_nonzero(np.isnan(slice)).

Collectives™ on Stack Overflow

how to return number of certain values in an array using numpy

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related