2

I have a dataset in the following format:

[[ 226 600 3.33 915. 92.6 98.6 ] [ 217 700 3.34 640. 93.7 98.5 ] [ 213 900 3.35 662. 88.8 96. ] ... [ 108 600 2.31 291. 64. 70.4 ] [ 125 800 3.36 1094. 65.5 84.1 ] [ 109 400 2.44 941. 52.3 68.7 ]]

Each column is a separate criteria that has its own value range. How can I impute values that are 0 to a value that is more than zero based on its column range? In other words the worst minimal value other than 0.

I have written the following code but it can only either change the 0 to the minimal value in the column (which is of course 0) or max. The max varies by column. Thanks for your help!

# Impute 0 values -- give them the worst value for that column

I, J = np.nonzero(scores == 0)
scores[I,J] = scores.min(axis=0)[J] # can only do min or max
2
  • more than 0 but less than max, so in other words the worst value in a column other than 0. Sorry for the confusion Commented May 27, 2019 at 12:02
  • Yes, my bad. I've edited Commented May 27, 2019 at 12:04

2 Answers 2

1

One way would be to use a masked array to find the minimum value along the columns masking those that are <=0. And replace the 0s in the array with the corresponding minimum using np.where:

min_gt0 = np.ma.array(r, mask=r<=0).min(0)
np.where(r == 0, min_gt0, r)

Here's an example:

r = np.random.randint(0,5,(5,5))

print(r)
array([[2, 1, 3, 0, 4],
       [0, 4, 4, 2, 2],
       [4, 0, 0, 0, 1],
       [1, 2, 2, 2, 2],
       [2, 0, 4, 4, 2]])

min_gt0 = np.ma.array(r, mask=r<=0).min(0)
np.where(r == 0, min_gt0, r)

array([[2, 1, 3, 2, 4],
       [1, 4, 4, 2, 2],
       [4, 1, 2, 2, 1],
       [1, 2, 2, 2, 2],
       [2, 1, 4, 4, 2]])
Sign up to request clarification or add additional context in comments.

2 Comments

Thanks for the solution. I was having issues but then realised I need to declare a variable for np.where...
Yes you have to assign it to a variable. You're welcome @asleniovas :)
1

I think the numpy.ma.masked_equal function is what you need.

consider an array:

a = np.array([4, 3, 8, 0, 5])
m = np.ma.masked_equal(a, 0) # mask = [0, 0, 0, 1, 0]

now you can call m.min() and the value is the second smallest value in the column.

m.min() # 3

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.