How to pick a random element in an np array only if it isn't a certain value

Question

I'm using python.

I have what might be an easy question, though I can't see it.

If I have an array x = array([1.0,0.0,1.5,0.0,6.0]) and y = array([1,2,3,4,5])

I'm looking for an efficient way to pick randomly between 1.0,1.5,6.0 ignoring all zeros while retaining the index for comparison with another array such as y. So if I was to randomly pick 6.0 I could still relate it to y[4].

The reason for the efficient bit is that eventually I want to pick between possibly 10 values from an array of 1000+ with the rest zero. Depending on how intensive the other calculations become depends on the size of the array though it could easily become much larger than 1000.

Thanks

Oddly enough, a very similar question was just asked here; you can use .nonzero() to get the indices of the nonzero elements and then choose one of those. Does that make sense? — DSM
– DSM, Commented Mar 16, 2014 at 18:27
@DSM I'm not sure that's efficient in terms of either runtime or memory. — David Ehrmann
– David Ehrmann, Commented Mar 16, 2014 at 18:29
@DavidEhrmann: if there are relatively few non-zero values, as the OP has (10 of 1000+) then you're going to have to find them to get decent performance, which is O(n). You could avoid storing the indices (which if the matrix is sparse is going to be negligible storage anyway) by looping and drawing a new random number each time a nonzero value is spotted but that's going to be a lot slower. — DSM
– DSM, Commented Mar 16, 2014 at 18:35
Thanks guys I appreciate the help. I'll have a look through and post back. — ArcAngel
– ArcAngel, Commented Mar 17, 2014 at 2:08

CT Zhu · Accepted Answer · 2014-03-16 18:49:22Z

3

.nonzero() will only skip the 0 s, for any value that you want to skip this is a solution:

In [281]:

x = np.array([1.0,0.0,1.5,0.0,6.0])
non_zero_idx=np.argwhere(x!=0).flatten()
np.random.shuffle(non_zero_idx)
random_pick=x[non_zero_idx]
random_pick[0]
Out[281]:
1.5
In [282]:

%%timeit

x = np.array([1.0,0.0,1.5,0.0,6.0])
non_zero_idx=np.argwhere(x!=0).flatten()
np.random.shuffle(non_zero_idx)
random_pick=x[non_zero_idx]
random_pick[0]
10000 loops, best of 3: 104 µs per loop

If you are just interested in getting one random pick (rather then an array of random picks) , x[np.random.choice(non_zero_idx,1)] will be just enough. But that is actually slower:

In [286]:

%%timeit

x = np.array([1.0,0.0,1.5,0.0,6.0])
non_zero_idx=np.argwhere(x!=0).flatten()
x[np.random.choice(non_zero_idx,1)]
10000 loops, best of 3: 173 µs per loop

There are three way to get random picks, as @AndyHayden pointed out, behaving differently different array sizes:

In [299]:

X=np.random.random(100000)
In [300]:

%timeit np.random.choice(X, 1)
%timeit X[np.random.randint(0,len(X),1)]
10000 loops, best of 3: 70.3 µs per loop
100000 loops, best of 3: 12.6 µs per loop
In [301]:

%%timeit
np.random.shuffle(X)
X[0]
10 loops, best of 3: 130 ms per loop

In [302]:

X=np.random.random(10000)
In [303]:

%timeit np.random.choice(X, 1)
%timeit X[np.random.randint(0,len(X),1)]
10000 loops, best of 3: 70.1 µs per loop
100000 loops, best of 3: 12.6 µs per loop
In [304]:

%%timeit
np.random.shuffle(X)
X[0]
100 loops, best of 3: 12.6 ms per loop

Appears X[np.random.randint(0,len(X),1)] is the best.

edited Mar 16, 2014 at 18:49

answered Mar 16, 2014 at 18:33

CT Zhu

54.6k18 gold badges125 silver badges136 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Andy Hayden Over a year ago

It's may only slower for small examples.

CT Zhu Over a year ago

Good point. The shuffle way would most likely to be the the best choice when we want an array to iterate though it.

Andy Hayden Over a year ago

Another trick is to use ravel rather than flatten (as this doesn't create a copy). (can't +1 again, but thanks for the additional timings! :) )

David Ehrmann · Accepted Answer · 2014-03-16 18:29:00Z

1

If you're willing to accept probabilistic times and you have fewer than 50% ignored values, you can just retry until you have an acceptable value.

If you can't, you're going to have to go over the entire array at least once to know which values to ignore, but that takes n memory.

answered Mar 16, 2014 at 18:29

David Ehrmann

7,6143 gold badges35 silver badges46 bronze badges

Collectives™ on Stack Overflow

How to pick a random element in an np array only if it isn't a certain value

2 Answers 2

3 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related