35

Given the following NumPy array,

> a = array([[1, 2, 3, 4, 5], [1, 2, 3, 4, 5],[1, 2, 3, 4, 5]])

it's simple enough to shuffle a single row,

> shuffle(a[0])
> a
array([[4, 2, 1, 3, 5],[1, 2, 3, 4, 5],[1, 2, 3, 4, 5]])

Is it possible to use indexing notation to shuffle each of the rows independently? Or do you have to iterate over the array. I had in mind something like,

> numpy.shuffle(a[:])
> a
array([[4, 2, 3, 5, 1],[3, 1, 4, 5, 2],[4, 2, 1, 3, 5]]) # Not the real output

though this clearly doesn't work.

3 Answers 3

32

Vectorized solution with rand+argsort trick

We could generate unique indices along the specified axis and index into the the input array with advanced-indexing. To generate the unique indices, we would use random float generation + sort trick, thus giving us a vectorized solution. We would also generalize it to cover generic n-dim arrays and along generic axes with np.take_along_axis. The final implementation would look something like this -

def shuffle_along_axis(a, axis):
    idx = np.random.rand(*a.shape).argsort(axis=axis)
    return np.take_along_axis(a,idx,axis=axis)

Note that this shuffle won't be in-place and returns a shuffled copy.

Sample run -

In [33]: a
Out[33]: 
array([[18, 95, 45, 33],
       [40, 78, 31, 52],
       [75, 49, 42, 94]])

In [34]: shuffle_along_axis(a, axis=0)
Out[34]: 
array([[75, 78, 42, 94],
       [40, 49, 45, 52],
       [18, 95, 31, 33]])

In [35]: shuffle_along_axis(a, axis=1)
Out[35]: 
array([[45, 18, 33, 95],
       [31, 78, 52, 40],
       [42, 75, 94, 49]])
Sign up to request clarification or add additional context in comments.

2 Comments

Interesting solution! However I made a quick experiment and it was way slower (on the order of 1000x) then the naiive solution below which repeatedly invokes rng.shuffle. Can anyone confirm this? Why is it so slow?
@Nils I am not sure the naive solution you are referring is still here but an explanation would be that rng.shuffle shuffles only does in-place shuffling (O(n) time complexity). For this solution you have to allocate memory for the unique indices, do sorting with argsort (O(nlogn) time complexity), and then you have to allocate new memory for the result as well. Thus the naive solution scales better for large arrays.
24

You have to call numpy.random.shuffle() several times because you are shuffling several sequences independently. numpy.random.shuffle() works on any mutable sequence and is not actually a ufunc. The shortest and most efficient code to shuffle all rows of a two-dimensional array a separately probably is

list(map(numpy.random.shuffle, a))

Some people prefer to write this as a list comprehension instead:

[numpy.random.shuffle(x) for x in a]

8 Comments

Thanks, simple and clean solution.
at least for python 3.5, numpy 1.10.2, this doesn't work, a remains unchanged.
@drevicko: What dimension does your array have? This answer is for shuffling all rows of a two-dimensional array (and I'm sure it also works with your combination of Python and Numpy versions).
Aha! I see what happened: in Python 3.5, map is lazy, producing an iterator, and doesn't do the mapping until you iterate through it. If you do e.g.: for _ in map(...): pass it'll work.
@drevicko That makes sense. It might be best to write that code as for x in a: numpy.random.shuffle(x) then.
|
18

For those looking at this question more recently, numpy provides the permuted method to shuffle an array independently along the specified axis.

From their documentation (using random.Generator)

rng = np.random.default_rng()
x = np.arange(24).reshape(3, 8)
x
array([[ 0,  1,  2,  3,  4,  5,  6,  7],
       [ 8,  9, 10, 11, 12, 13, 14, 15],
       [16, 17, 18, 19, 20, 21, 22, 23]])

y = rng.permuted(x, axis=1)
y
array([[ 4,  3,  6,  7,  1,  2,  5,  0],  
       [15, 10, 14,  9, 12, 11,  8, 13],
       [17, 16, 20, 21, 18, 22, 23, 19]])

3 Comments

Great answer and exactly what I was looking for - this is the canonical way to do this now.
Note that this and Divakar's answer below do not preserve internal structure that is needed for machine learning tasks (because there is cross shuffling across axes/slices).
I was looking more for a solution that is similar to random.shuffle() but works on internal axes (e.g., axis=1 in a 3d array) instead of the 0th axis supported by shuffle.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.