0

I need to return the number of non-reasonable (nan or out of range) values for the 3rd column where has 0s an a blank in it. I have to deal with a csv file in a real problem but I just created a ndarray for now.

data = np.array([[   1, 2000,  143, 4546], [   2, 1999,  246,    0], [   3, 2008,  190,    ], [   4, 2000,  100,    0]])

I cant even think where I should start.

It will be awesome if someone can help.

6
  • 1
    Can you be more specific about what you need? What's the output you want from this for instance? Commented Sep 20, 2017 at 14:39
  • that's what I'm trying to understand too.. that's just how the question was worded. So I guess I just have to return how many cells in the csv file is either blank, nan or 0. Commented Sep 20, 2017 at 14:42
  • should be data = np.array([[ 1, 2000, 143, 4546], [ 2, 1999, 246, 0], [ 3, 2008, 190, np.NAN ], [ 4, 2000, 100, 0]]) Commented Sep 20, 2017 at 14:47
  • but some of the real data do have some blanks like that. no? Commented Sep 20, 2017 at 14:50
  • If you tell numpy that it's a numerical array, it can't then store a blank since that implies "" which is a string. The blank would have to be interpreted as a NaN. Commented Sep 20, 2017 at 15:07

1 Answer 1

1

First, you need to be able to access just the column that you're interested in. Do this with a slice:

data[:,2] # grab all rows, and just the column with index 2

Now you want to count the occurrences that are NaN:

np.count_nonzero(np.isnan(data[:,2]))

And we want to count the number of zero elements:

data[:,2].size - np.count_nonzero(data[:,2])

And if we add those together:

data[:,2].size - np.count_nonzero(data[:,2]) + np.count_nonzero(np.isnan(data[:,2]))

This is boring, though, since the 3rd column doesn't have any 0 or NaN in it. Lets try with the last column:

>>> slice = data[:,3]
>>> slice.size - np.count_nonzero(slice) + np.count_nonzero(np.isnan(slice))
3

edit I should explain why this works:

np.isnan(data[:,2]) gives an array of True and False based on if it's a NaN or not. True, when treated as a number, is converted to 1 and False is converted to0so thenp.count_nonzerocall counts the number of1which represent theNaN` values.

np.count_nonzero(data[:,2]) counts the number of non-zero directly. If we subtract the number of non-zero elements from the total number of elements, we'll get the number of 0s.

Sign up to request clarification or add additional context in comments.

2 Comments

hey thank you so much. I have tried this code on my real csv data. and I tested on the several column that has 0s in it. but I am getting one more count than the actual count. so if there are 54 0s in one column then I get 55 somehow.
Try running the the three parts by themselves to see which one is giving the wrong number. So slice.size, then np.count_nonzero(slice) then np.count_nonzero(np.isnan(slice)).

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.