1

I have a 2D numpy array which contains my values (some of them can be NaN). I want to remove the 30% of the non-NaN values and replace them with the mean of the array. How can I do so? What I tried so far:

def spar_removal(array, mean_value, sparseness):
    array1 = deepcopy(array)
    array2 = array1
    spar_size = int(round(array2.shape[0]*array2.shape[1]*sparseness))
    for i in range (0, spar_size):
        index = np.random.choice(np.where(array2 != mean_value)[1])
        array2[0, index] = mean_value
    return array2

But this is just picking the same row of my array. How can I remove from all over the array? It seems that choice works only for one dimension. I guess what I want is to calculate the (x, y) pairs that I will replace its value with mean_value.

3
  • 1
    Does it need to be exactly 30% of the non-NaN values, or does each non-NaN value need a 30% chance of being replaced? E.g. if we had 100 non-NaN values, do you need exactly 30 of them to be replaced, or would you be okay with each value getting a 30% chance of being replaced so that sometimes you'd get 27 replacements and very rarely 45? Commented Jun 9, 2018 at 13:46
  • Yup needs to remove the 30% of non-NaNs Commented Jun 9, 2018 at 13:49
  • There's a difference between remove and replace. Remove implies, at least me, reducing the shape of the array, e.g. from a (100,100) to (90,90) or some such value. While it is easy to remove a whole row or column, removing individual elements is hard without making the array ragged. Commented Jun 9, 2018 at 18:11

2 Answers 2

5

There's likely a better way, but consider:

import numpy as np

x = np.array([[1,2,3,4],
              [1,2,3,4],
              [np.NaN, np.NaN, np.NaN, np.NaN],
              [1,2,3,4]])

# Get a vector of 1-d indexed indexes of non NaN elements
indices = np.where(np.isfinite(x).ravel())[0]

# Shuffle the indices, select the first 30% (rounded down with int())
to_replace = np.random.permutation(indices)[:int(indices.size * 0.3)]

# Replace those indices with the mean (ignoring NaNs)
x[np.unravel_index(to_replace, x.shape)] = np.nanmean(x)

print(x)

Example Output

[[ 2.5  2.   2.5  4. ]
 [ 1.   2.   3.   4. ]
 [ nan  nan  nan  nan]
 [ 2.5  2.   3.   4. ]]

NaNs will never change and floor(0.3 * number of non-NaN elements) will be set to the mean (the mean ignoring NaNs).

Sign up to request clarification or add additional context in comments.

Comments

1

Since where returns two array contains the indexs, this is what you want:

def spar_removal(array, mean_value, sparseness):

    array1 = copy.deepcopy(array)
    array2 = array1
    spar_size = int(round(array2.shape[0]*array2.shape[1]*sparseness))
    # This is used to filtered out nan
    indexs = np.where(array2==array2)
    indexsL = len(indexs[0])

    for i in np.random.choice(indexsL,spar_size,replace=False):
        indexX = indexs[0][i]
        indexY = indexs[1][i]
        array2[indexX,indexY] = mean_value

return array2

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.