Sampling rows in 2D numpy arrays with replacement

Question

numpy.random.choice is a handy tool for sampling random elements from a 1D array:

In [94]: numpy.random.choice(numpy.arange(5), 10)
Out[94]: array([3, 1, 4, 3, 4, 3, 2, 4, 1, 1])

But the docs specify that a param must be one dimensional. But if I want to get a random selection of rows from a 2D array (for example, random samples for a one hot encoder), then numpy.random.choice cannot be used anymore.

So if my input is:

array([[ 1.,  0.,  0.],
       [ 0.,  1.,  0.],
       [ 0.,  0.,  1.]])

How can I get n rows in random fashion from this array, like this? (n = 10)

array([[ 0.,  0.,  1.],
       [ 1.,  0.,  0.],
       [ 0.,  0.,  1.],
       [ 0.,  0.,  1.],
       [ 1.,  0.,  0.],
       [ 0.,  1.,  0.],
       [ 1.,  0.,  0.],
       [ 0.,  0.,  1.],
       [ 1.,  0.,  0.],
       [ 1.,  0.,  0.]])

cs95 · Accepted Answer · 2017-07-14 11:35:34Z

9

As per this issue, the feature was considered in 2014, but no substantial additions have been made to the API since then. There is, however, a better solution that cleverly makes use of numpy.random.choice and numpy's fancy indexing:

Starting with

In [102]: x = numpy.eye(3); x
Out[102]: 
array([[ 1.,  0.,  0.],
       [ 0.,  1.,  0.],
       [ 0.,  0.,  1.]])

You may use numpy.random.choice to generate a list of random indices, like this:

In [103]: i = numpy.random.choice(3, 10); i
Out[103]: array([2, 2, 0, 2, 1, 1, 2, 0, 0, 1])

Then use i to index x:

In [104]: x[i]
Out[104]: 
array([[ 0.,  0.,  1.],
       [ 0.,  0.,  1.],
       [ 1.,  0.,  0.],
       [ 0.,  0.,  1.],
       [ 0.,  1.,  0.],
       [ 0.,  1.,  0.],
       [ 0.,  0.,  1.],
       [ 1.,  0.,  0.],
       [ 1.,  0.,  0.],
       [ 0.,  1.,  0.]])

With a workaround this efficient, I don't believe a change to the API is necessary.

Do note that, for generating rows with a certain probability distribution, the procedure is the same. Specify a probability distribution on the indices itself.

edited Jul 14, 2017 at 11:35

answered Jul 14, 2017 at 3:01

cs95

406k106 gold badges744 silver badges797 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

roberto tomás Over a year ago

this is essentially a for-loop method ... which is kinda ugly, especially for numpy. Isn-t there a better way?

cs95 Over a year ago

@robertotomás what makes you think this is essentially a for loop? Everything here is vectorised. I don't think they've come up with something better yet.

roberto tomás Over a year ago

I just realized that that wasn-t for i in i, it was the whole thing at once :) thank you

GSA · Accepted Answer · 2022-02-02 14:28:43Z

Just to add another way of selecting rows from a 2-D array using the numpy.random.Generator.choice approach. Half-way through the page on the link below https://numpy.org/doc/stable/reference/random/generated/numpy.random.choice.html it indicates that "sampling random rows from a 2-D array is . . . possible with Generator.choice through its axis keyword."

This approach works with pandas dataframe too. The only thing is that it changes dataframe to arrays after the sampling. Which you can easily convert back to dataframe.

Piggy-backing off what cs95 did, you could do the following:

x = np.eye(3); x

# numpy.random.Generator.choice
rng = np.random.default_rng()

y = rng.choice(a=x, size=10, replace=True, axis=0)
y

array([[0., 1., 0.],
       [0., 1., 0.],
       [0., 0., 1.],
       [1., 0., 0.],
       [0., 0., 1.],
       [0., 1., 0.],
       [1., 0., 0.],
       [0., 0., 1.],
       [0., 1., 0.],
       [0., 1., 0.]])

Collectives™ on Stack Overflow

Sampling rows in 2D numpy arrays with replacement

2 Answers 2

3 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related