1

I am reading data (pixel values to be exact) from a h5 file and plotting the data in a histogram using numpy. In the array of my pixel values I have my no-data value which is 99999 (the range of my data is otherwise -40 to 20). Im creating a histogram based on a min and max that I set manually (-40 and 20 respectively) so the no-data value doesn't show up in my histogram - which is fine. However, I want to fit a normal curve over my data and for this I need the mean and SD of the dataset. When I generate these with numpy.mean and numpy.std it includes the no-data value so my mean and SD values are way off and my subsequent normal curve is too.

Essentially, Is there a way to generate the mean and sd from an array, ignoring a given value (i.e. my no-data value: 99999) or alternatively output the values of my array to a new array without the no-data value?

Thanks

0

2 Answers 2

4

Sounds like you should be storing your data in a masked array instead of this hacky method with 99999 no-data value. Start looking in np.ma.

Simple example:

>>> a = np.array([1, 2, 99999, 3])
>>> a.mean()
25001.25
>>> a_ = np.ma.masked_array(a, a == 99999)
>>> a_.mean()
2.0
>>> a_
masked_array(data = [1 2 -- 3],
             mask = [False False  True False],
       fill_value = 999999)
Sign up to request clarification or add additional context in comments.

Comments

1

Is that OK for you to go through the data first, and save the useful data in another list (or any other structure you use), then process the new list with useful data only?

Or try this solution, How to count values in a certain range in a Numpy array?

1 Comment

Thanks, it looks like that would work but Ive managed it using a = [x for x in a if x != 99999]

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.