Choosing a random sample from each row of Numpy array, excluding negative numbers

Question

I have a Numpy array that looks like

>>> a
array([[ 3. ,  2. , -1. ],
       [-1. ,  0.1,  3. ],
       [-1. ,  2. ,  3.5]])

I would like to select a value from each row at random, but I would like to exclude the -1 values from the random sampling.

What I do currently is:

x=[]
for i in range(a.shape[0]):
    idx=numpy.where(a[i,:]>0)[0]
    idxr=random.sample(idx,1)[0]
    xi=a[i,idxr]
    x.append(xi)

and get

>>> x
[3.0, 3.0, 2.0]

This is becoming a bit slow for large arrays and I would like to know if there is a way to conditionally select random values from the original a matrix without dealing with each row individually.

I don't have any experience with NumPy but I would have guessed that generating a random number would take longer than accessing the value from the array. The same is true of appending to a list. Have you profiled your program to make sure you're optimizing the right thing? — torak
– torak, Commented Jun 30, 2010 at 16:24
I've profiled the program and the idx and idxr lines are the slowest, with an almost equal amount of time spent on each. — fideli
– fideli, Commented Jun 30, 2010 at 17:11
Do you always expect to have the same number of excluded values in each row? If so, you can vectorize the whole thing and do it in two lines of code with no python loops... — Joe Kington
– Joe Kington, Commented Jun 30, 2010 at 22:18
@Joe Kington: not necessarily. For all intents and purposes, the rows belong to independent samples. — fideli
– fideli, Commented Jul 1, 2010 at 2:11

Justin Peel · Accepted Answer · 2010-06-30 21:24:33Z

3

I really don't think that you will find anything in Numpy that does exactly what you are asking as packaged so I've decided to offer what optimizations I could think up.

There are several things that could make this slow here. First off, numpy.where() is rather slow because it has to check every value in the sliced array (the slice is generated for each row as well) and then generate an array of values. The best thing that you could do if you plan on doing this process over and over again on the same matrix would be to sort each row. Then you would just use a binary search to find where the positive values start and just use a random number to select a value from them. Of course, you could also just store the indices where the positive values start after finding them once with binary searches.

If you don't plan on doing this process many times over, then I would recommend using Cython to speed up the numpy.where line. Cython would allow you to not need to slice the rows out and speed up the process overall.

My last suggestion is to use random.choice rather than random.sample unless you really do plan on choosing sample sizes that are larger than 1.

answered Jun 30, 2010 at 21:24

Justin Peel

47.1k6 gold badges62 silver badges81 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

fideli Over a year ago

I'll be doing this process on similar but newly generated arrays many times over, so I'll look into Cython. Thanks!

Collectives™ on Stack Overflow

Choosing a random sample from each row of Numpy array, excluding negative numbers

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related