2

I'm using python.

I have what might be an easy question, though I can't see it.

If I have an array x = array([1.0,0.0,1.5,0.0,6.0]) and y = array([1,2,3,4,5])

I'm looking for an efficient way to pick randomly between 1.0,1.5,6.0 ignoring all zeros while retaining the index for comparison with another array such as y. So if I was to randomly pick 6.0 I could still relate it to y[4].

The reason for the efficient bit is that eventually I want to pick between possibly 10 values from an array of 1000+ with the rest zero. Depending on how intensive the other calculations become depends on the size of the array though it could easily become much larger than 1000.

Thanks

4
  • Oddly enough, a very similar question was just asked here; you can use .nonzero() to get the indices of the nonzero elements and then choose one of those. Does that make sense? Commented Mar 16, 2014 at 18:27
  • @DSM I'm not sure that's efficient in terms of either runtime or memory. Commented Mar 16, 2014 at 18:29
  • @DavidEhrmann: if there are relatively few non-zero values, as the OP has (10 of 1000+) then you're going to have to find them to get decent performance, which is O(n). You could avoid storing the indices (which if the matrix is sparse is going to be negligible storage anyway) by looping and drawing a new random number each time a nonzero value is spotted but that's going to be a lot slower. Commented Mar 16, 2014 at 18:35
  • Thanks guys I appreciate the help. I'll have a look through and post back. Commented Mar 17, 2014 at 2:08

2 Answers 2

3

.nonzero() will only skip the 0 s, for any value that you want to skip this is a solution:

In [281]:

x = np.array([1.0,0.0,1.5,0.0,6.0])
non_zero_idx=np.argwhere(x!=0).flatten()
np.random.shuffle(non_zero_idx)
random_pick=x[non_zero_idx]
random_pick[0]
Out[281]:
1.5
In [282]:

%%timeit

x = np.array([1.0,0.0,1.5,0.0,6.0])
non_zero_idx=np.argwhere(x!=0).flatten()
np.random.shuffle(non_zero_idx)
random_pick=x[non_zero_idx]
random_pick[0]
10000 loops, best of 3: 104 µs per loop

If you are just interested in getting one random pick (rather then an array of random picks) , x[np.random.choice(non_zero_idx,1)] will be just enough. But that is actually slower:

In [286]:

%%timeit

x = np.array([1.0,0.0,1.5,0.0,6.0])
non_zero_idx=np.argwhere(x!=0).flatten()
x[np.random.choice(non_zero_idx,1)]
10000 loops, best of 3: 173 µs per loop

There are three way to get random picks, as @AndyHayden pointed out, behaving differently different array sizes:

In [299]:

X=np.random.random(100000)
In [300]:

%timeit np.random.choice(X, 1)
%timeit X[np.random.randint(0,len(X),1)]
10000 loops, best of 3: 70.3 µs per loop
100000 loops, best of 3: 12.6 µs per loop
In [301]:

%%timeit
np.random.shuffle(X)
X[0]
10 loops, best of 3: 130 ms per loop

In [302]:

X=np.random.random(10000)
In [303]:

%timeit np.random.choice(X, 1)
%timeit X[np.random.randint(0,len(X),1)]
10000 loops, best of 3: 70.1 µs per loop
100000 loops, best of 3: 12.6 µs per loop
In [304]:

%%timeit
np.random.shuffle(X)
X[0]
100 loops, best of 3: 12.6 ms per loop

Appears X[np.random.randint(0,len(X),1)] is the best.

Sign up to request clarification or add additional context in comments.

3 Comments

It's may only slower for small examples.
Good point. The shuffle way would most likely to be the the best choice when we want an array to iterate though it.
Another trick is to use ravel rather than flatten (as this doesn't create a copy). (can't +1 again, but thanks for the additional timings! :) )
1

If you're willing to accept probabilistic times and you have fewer than 50% ignored values, you can just retry until you have an acceptable value.

If you can't, you're going to have to go over the entire array at least once to know which values to ignore, but that takes n memory.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.