2

numpy.random.choice is a handy tool for sampling random elements from a 1D array:

In [94]: numpy.random.choice(numpy.arange(5), 10)
Out[94]: array([3, 1, 4, 3, 4, 3, 2, 4, 1, 1])

But the docs specify that a param must be one dimensional. But if I want to get a random selection of rows from a 2D array (for example, random samples for a one hot encoder), then numpy.random.choice cannot be used anymore.

So if my input is:

array([[ 1.,  0.,  0.],
       [ 0.,  1.,  0.],
       [ 0.,  0.,  1.]])  

How can I get n rows in random fashion from this array, like this? (n = 10)

array([[ 0.,  0.,  1.],
       [ 1.,  0.,  0.],
       [ 0.,  0.,  1.],
       [ 0.,  0.,  1.],
       [ 1.,  0.,  0.],
       [ 0.,  1.,  0.],
       [ 1.,  0.,  0.],
       [ 0.,  0.,  1.],
       [ 1.,  0.,  0.],
       [ 1.,  0.,  0.]])

2 Answers 2

9

As per this issue, the feature was considered in 2014, but no substantial additions have been made to the API since then. There is, however, a better solution that cleverly makes use of numpy.random.choice and numpy's fancy indexing:

Starting with

In [102]: x = numpy.eye(3); x
Out[102]: 
array([[ 1.,  0.,  0.],
       [ 0.,  1.,  0.],
       [ 0.,  0.,  1.]])

You may use numpy.random.choice to generate a list of random indices, like this:

In [103]: i = numpy.random.choice(3, 10); i
Out[103]: array([2, 2, 0, 2, 1, 1, 2, 0, 0, 1])

Then use i to index x:

In [104]: x[i]
Out[104]: 
array([[ 0.,  0.,  1.],
       [ 0.,  0.,  1.],
       [ 1.,  0.,  0.],
       [ 0.,  0.,  1.],
       [ 0.,  1.,  0.],
       [ 0.,  1.,  0.],
       [ 0.,  0.,  1.],
       [ 1.,  0.,  0.],
       [ 1.,  0.,  0.],
       [ 0.,  1.,  0.]])

With a workaround this efficient, I don't believe a change to the API is necessary.

Do note that, for generating rows with a certain probability distribution, the procedure is the same. Specify a probability distribution on the indices itself.

Sign up to request clarification or add additional context in comments.

3 Comments

this is essentially a for-loop method ... which is kinda ugly, especially for numpy. Isn-t there a better way?
@robertotomás what makes you think this is essentially a for loop? Everything here is vectorised. I don't think they've come up with something better yet.
I just realized that that wasn-t for i in i, it was the whole thing at once :) thank you
2

Just to add another way of selecting rows from a 2-D array using the numpy.random.Generator.choice approach. Half-way through the page on the link below https://numpy.org/doc/stable/reference/random/generated/numpy.random.choice.html it indicates that "sampling random rows from a 2-D array is . . . possible with Generator.choice through its axis keyword."

This approach works with pandas dataframe too. The only thing is that it changes dataframe to arrays after the sampling. Which you can easily convert back to dataframe.

Piggy-backing off what cs95 did, you could do the following:

x = np.eye(3); x

# numpy.random.Generator.choice
rng = np.random.default_rng()

y = rng.choice(a=x, size=10, replace=True, axis=0)
y

array([[0., 1., 0.],
       [0., 1., 0.],
       [0., 0., 1.],
       [1., 0., 0.],
       [0., 0., 1.],
       [0., 1., 0.],
       [1., 0., 0.],
       [0., 0., 1.],
       [0., 1., 0.],
       [0., 1., 0.]])

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.