Shuffling NumPy array along a given axis

Question

Given the following NumPy array,

> a = array([[1, 2, 3, 4, 5], [1, 2, 3, 4, 5],[1, 2, 3, 4, 5]])

it's simple enough to shuffle a single row,

> shuffle(a[0])
> a
array([[4, 2, 1, 3, 5],[1, 2, 3, 4, 5],[1, 2, 3, 4, 5]])

Is it possible to use indexing notation to shuffle each of the rows independently? Or do you have to iterate over the array. I had in mind something like,

> numpy.shuffle(a[:])
> a
array([[4, 2, 3, 5, 1],[3, 1, 4, 5, 2],[4, 2, 1, 3, 5]]) # Not the real output

though this clearly doesn't work.

Community · Accepted Answer · 2020-06-20 09:12:55Z

32

Vectorized solution with `rand+argsort` trick

We could generate unique indices along the specified axis and index into the the input array with advanced-indexing. To generate the unique indices, we would use random float generation + sort trick, thus giving us a vectorized solution. We would also generalize it to cover generic n-dim arrays and along generic axes with np.take_along_axis. The final implementation would look something like this -

def shuffle_along_axis(a, axis):
    idx = np.random.rand(*a.shape).argsort(axis=axis)
    return np.take_along_axis(a,idx,axis=axis)

Note that this shuffle won't be in-place and returns a shuffled copy.

Sample run -

In [33]: a
Out[33]: 
array([[18, 95, 45, 33],
       [40, 78, 31, 52],
       [75, 49, 42, 94]])

In [34]: shuffle_along_axis(a, axis=0)
Out[34]: 
array([[75, 78, 42, 94],
       [40, 49, 45, 52],
       [18, 95, 31, 33]])

In [35]: shuffle_along_axis(a, axis=1)
Out[35]: 
array([[45, 18, 33, 95],
       [31, 78, 52, 40],
       [42, 75, 94, 49]])

edited Jun 20, 2020 at 9:12

CommunityBot

11 silver badge

answered Mar 23, 2019 at 19:04

Divakar

222k19 gold badges273 silver badges374 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Nils Over a year ago

Interesting solution! However I made a quick experiment and it was way slower (on the order of 1000x) then the naiive solution below which repeatedly invokes rng.shuffle. Can anyone confirm this? Why is it so slow?

Naphat Amundsen Over a year ago

@Nils I am not sure the naive solution you are referring is still here but an explanation would be that rng.shuffle shuffles only does in-place shuffling (O(n) time complexity). For this solution you have to allocate memory for the unique indices, do sorting with argsort (O(nlogn) time complexity), and then you have to allocate new memory for the result as well. Thus the naive solution scales better for large arrays.

Sven Marnach · Accepted Answer · 2020-05-26 12:31:17Z

24

You have to call numpy.random.shuffle() several times because you are shuffling several sequences independently. numpy.random.shuffle() works on any mutable sequence and is not actually a ufunc. The shortest and most efficient code to shuffle all rows of a two-dimensional array a separately probably is

list(map(numpy.random.shuffle, a))

Some people prefer to write this as a list comprehension instead:

[numpy.random.shuffle(x) for x in a]

edited May 26, 2020 at 12:31

answered Feb 18, 2011 at 17:15

Sven Marnach

608k123 gold badges966 silver badges865 bronze badges

8 Comments

lafras Over a year ago

Thanks, simple and clean solution.

drevicko Over a year ago

at least for python 3.5, numpy 1.10.2, this doesn't work, a remains unchanged.

Sven Marnach Over a year ago

@drevicko: What dimension does your array have? This answer is for shuffling all rows of a two-dimensional array (and I'm sure it also works with your combination of Python and Numpy versions).

drevicko Over a year ago

Aha! I see what happened: in Python 3.5, map is lazy, producing an iterator, and doesn't do the mapping until you iterate through it. If you do e.g.: for _ in map(...): pass it'll work.

Sven Marnach Over a year ago

@drevicko That makes sense. It might be best to write that code as for x in a: numpy.random.shuffle(x) then.

|

user3820991 · Accepted Answer · 2023-01-06 15:21:01Z

18

For those looking at this question more recently, numpy provides the permuted method to shuffle an array independently along the specified axis.

From their documentation (using random.Generator)

rng = np.random.default_rng()
x = np.arange(24).reshape(3, 8)
x
array([[ 0,  1,  2,  3,  4,  5,  6,  7],
       [ 8,  9, 10, 11, 12, 13, 14, 15],
       [16, 17, 18, 19, 20, 21, 22, 23]])

y = rng.permuted(x, axis=1)
y
array([[ 4,  3,  6,  7,  1,  2,  5,  0],  
       [15, 10, 14,  9, 12, 11,  8, 13],
       [17, 16, 20, 21, 18, 22, 23, 19]])

answered Jan 6, 2023 at 15:21

user3820991

2,6705 gold badges26 silver badges35 bronze badges

3 Comments

Praveen Over a year ago

Great answer and exactly what I was looking for - this is the canonical way to do this now.

omsrisagar Jul 31 at 23:05

Note that this and Divakar's answer below do not preserve internal structure that is needed for machine learning tasks (because there is cross shuffling across axes/slices).

omsrisagar Jul 31 at 23:22

I was looking more for a solution that is similar to random.shuffle() but works on internal axes (e.g., axis=1 in a 3d array) instead of the 0th axis supported by shuffle.

Collectives™ on Stack Overflow

Shuffling NumPy array along a given axis

3 Answers 3

Vectorized solution with `rand+argsort` trick

2 Comments

8 Comments

3 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Vectorized solution with rand+argsort trick

2 Comments

8 Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related

Vectorized solution with `rand+argsort` trick