6

I need to replace some values in a numpy array based on a condition with a random number.

I have a function that adds a random value 50% of the time:

def add_noise(noise_factor=0.5):

    chance = random.randint(1,100)
    threshold_prob = noise_factor * 100.

    if chance <= threshold_prob:
        noise = float(np.random.randint(1,100))
    else:
        noise = 0.

    return(noise)

But when I call the numpy function, it replaces all matching values with the random number generated:

np.place(X, X==0., add_noise(0.5))

The problem with this, is that add_noise() only runs once, and it replaces all the 0. values with the noise value.

What I am trying to do is "iterate" through every element in the numpy array, check the condition (is it ==0.) and I want to generate the noise value through add_noise() every time.

I could do this with a for loop going through every row and column, but does anyone know of a more efficient manner of doing it?

2
  • Did either of the posted solutions work for you? Commented Mar 2, 2017 at 16:30
  • Yes, the vectorized approach was good. Thanks. Commented Mar 3, 2017 at 17:47

3 Answers 3

3

Here's one vectorized approach -

noise_factor = 0.5 # Input param

# Get mask of zero places and the count of it. Also compute threshold
mask = X==0
c = np.count_nonzero(mask)
threshold_prob = noise_factor * 100.

# Generate noise numbers for count number of times. 
# This is where vectorization comes into the play.
nums = np.random.randint(1,100, c)

# Finally piece of the vectorization comes through replacing that IF-ELSE
# with np,where that does the same op of choosing but in a vectorized way
vals = np.where(nums <= threshold_prob, np.random.randint(1,100, c) , 0)

# Assign back into X
X[mask] = vals

Additional benefit is that we are re-using the mask of 0s for the add_noise operation and also for assigning back into X. This replaces the use of np.place and is meant as an efficiency criteria.

Further performance boost

We could optimize further at the steps that compute nums and vals that use two steps of random number generation by doing that instead once and re-using at the second step, like so -

nums = np.random.randint(1,100, (2,c))
vals = np.where(nums[0] <= threshold_prob, nums[1] , 0)
Sign up to request clarification or add additional context in comments.

Comments

1

You could vectorize your function, which makes it easy to apply to every element and is also quite efficient I suppose.

import random
import numpy as np

def add_noise(x):
    if not x:
        if random.random() <= 0.5:
            noise = float(np.random.randint(1,100))
            return noise
        else:
            return 0
    else:
        return x

x = np.zeros(shape=(10, 10))

n = np.vectorize(add_noise)
x = n(x)

Comments

0

If I understand correctly, you want to change values of a numpy array to a random value based on two conditions.

  1. value should be zero
  2. some random chance factor

For these two conditions you can create two masks and combine them with np.logical_and. And you can use the np.random methods to get arrays of random numbers.

import numpy as np

def add_perhaps_noise_if_zero(x, threshold=0.5):
    mask_1 = x == 0.0
    mask_2 = np.random.random(x.shape) <= threshold
    mask_combined = np.logical_and(mask_1, mask_2)
    x[mask_combined] += np.random.random(x.shape)[mask_combined]
    return x


x = np.zeros((5,5))
for i in range(5):
    print(x)
    x = add_perhaps_noise_if_zero(x)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.