26

Let's say I have an array r of dimension (n, m). I would like to shuffle the columns of that array.

If I use numpy.random.shuffle(r) it shuffles the lines. How can I only shuffle the columns? So that the first column become the second one and the third the first, etc, randomly.

Example:

input:

array([[  1,  20, 100],
       [  2,  31, 401],
       [  8,  11, 108]])

output:

array([[  20, 1, 100],
       [  31, 2, 401],
       [  11,  8, 108]])

7 Answers 7

29

One approach is to shuffle the transposed array:

 np.random.shuffle(np.transpose(r))

Another approach (see YXD's answer https://stackoverflow.com/a/20546567/1787973) is to generate a list of permutations to retrieve the columns in that order:

 r = r[:, np.random.permutation(r.shape[1])]

Performance-wise, the second approach is faster.

Sign up to request clarification or add additional context in comments.

13 Comments

It is. I recommend r.T for transpose, though.
@user2357112 is r.T the exact same thing as np.transpose(r) but shorter?
Effectively identical. There's a very slight difference for 1-d arrays, but you probably won't be using either T or transpose for 1-d arrays.
Since numpy.shuffle shuffles the rows, but taking the transpose of a mtrix, you effectively shuffle the columns. Then you transpose back.
@Matt: This is an in-place operation on a view of the original array. It does not create a new, shuffled array, so there's no need to transpose the result.
|
6

For a general axis you could follow the pattern:

>>> import numpy as np
>>> 
>>> a = np.array([[  1,  20, 100, 4],
...               [  2,  31, 401, 5],
...               [  8,  11, 108, 6]])
>>> 
>>> print a[:, np.random.permutation(a.shape[1])]
[[  4   1  20 100]
 [  5   2  31 401]
 [  6   8  11 108]]
>>> 
>>> print a[np.random.permutation(a.shape[0]), :]
[[  1  20 100   4]
 [  2  31 401   5]
 [  8  11 108   6]]
>>> 

1 Comment

Something is wrong with the code, shuffling the columns should not return the original matrix in general.
5

So, one step further from your answer:

Edit: I very easily could be mistaken how this is working, so I'm inserting my understanding of the state of the matrix at each step.

r == 1 2 3
     4 5 6
     6 7 8

r = np.transpose(r)  

r == 1 4 6
     2 5 7
     3 6 8           # Columns are now rows

np.random.shuffle(r)

r == 2 5 7
     3 6 8 
     1 4 6           # Columns-as-rows are shuffled

r = np.transpose(r)  

r == 2 3 1
     5 6 4
     7 8 6           # Columns are columns again, shuffled.

which would then be back in the proper shape, with the columns rearranged.

The transpose of the transpose of a matrix == that matrix, or, [A^T]^T == A. So, you'd need to do a second transpose after the shuffle (because a transpose is not a shuffle) in order for it to be in its proper shape again.

Edit: The OP's answer skips storing the transpositions and instead lets the shuffle operate on r as if it were.

5 Comments

np.random.shuffle does not return the array.
So I see, edited. Regardless, the final step is needed to return your matrix to its original shape.
@Matt: No, no it's not. transpose returns a view of the original array. Once you shuffle the transposed array, the original is shuffled in the desired manner. There is no need to transpose twice.
@user2357112 I added a sample matrix to each step to illustrate my thought pattern. It's been a decade since my last linear class, but I'm pretty sure this is what the documentation for np.tranpose and np.random.shuffle indicate is going on.
@user2357112 read your comment to the question, got it now, thanks.
2

In general if you want to shuffle a numpy array along axis i:

def shuffle(x, axis = 0):
    n_axis = len(x.shape)
    t = np.arange(n_axis)
    t[0] = axis
    t[axis] = 0
    xt = np.transpose(x.copy(), t)
    np.random.shuffle(xt)
    shuffled_x = np.transpose(xt, t)
    return shuffled_x

shuffle(array, axis=i)

Comments

2
>>> print(s0)
>>> [[0. 1. 0. 1.]
     [0. 1. 0. 0.]
     [0. 1. 0. 1.]
     [0. 0. 0. 1.]]
>>> print(np.random.permutation(s0.T).T)
>>> [[1. 0. 1. 0.]
     [0. 0. 1. 0.]
     [1. 0. 1. 0.]
     [1. 0. 0. 0.]]

np.random.permutation(), does the row permutation.

Comments

2

There is another way, which does not use transposition and is apparently faster:

np.take(r, np.random.permutation(r.shape[1]), axis=1, out=r)

CPU times: user 1.14 ms, sys: 1.03 ms, total: 2.17 ms. Wall time: 3.89 ms

The approach in other answers: np.random.shuffle(r.T)

CPU times: user 2.24 ms, sys: 0 ns, total: 2.24 ms Wall time: 5.08 ms

I used r = np.arange(64*1000).reshape(64, 1000) as an input.

4 Comments

That's a very interesting approach! I recommend you to use timeit to prove that it's really faster. And according to my tests, it seems to be the case!
r[:, np.random.permutation(r.shape[1])] seems to be even faster than using np.take
@MaximeChéramy I used %%time because timeit was caching the intermediate results and it wasn't clear what was happening. r = r[:, np.random.permutation(r.shape[1])] is nice! But I have the impression that np.take with out=r uses less memory. Can you use %memit to check?
Based on my timing experiments, in fact r[:, np.random.permutation(r.shape[1])] does not appear to be faster than np.take. Most likely the speed-up that you see is because of caching.
0

numpy.random.Generator.shuffle has an axis parameter. This will shuffle in place:

rng = np.random.default_rng()
rng.shuffle(arr, axis=1)

https://numpy.org/doc/stable/reference/random/generated/numpy.random.Generator.shuffle.html

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.