Shuffle columns of an array with Numpy

Question

Let's say I have an array r of dimension (n, m). I would like to shuffle the columns of that array.

If I use numpy.random.shuffle(r) it shuffles the lines. How can I only shuffle the columns? So that the first column become the second one and the third the first, etc, randomly.

Example:

input:

array([[  1,  20, 100],
       [  2,  31, 401],
       [  8,  11, 108]])

output:

array([[  20, 1, 100],
       [  31, 2, 401],
       [  11,  8, 108]])

Maxime Chéramy · Accepted Answer · 2021-04-22 09:20:38Z

29

One approach is to shuffle the transposed array:

 np.random.shuffle(np.transpose(r))

Another approach (see YXD's answer https://stackoverflow.com/a/20546567/1787973) is to generate a list of permutations to retrieve the columns in that order:

 r = r[:, np.random.permutation(r.shape[1])]

Performance-wise, the second approach is faster.

edited Apr 22, 2021 at 9:20

answered Dec 12, 2013 at 14:41

Maxime Chéramy

19k10 gold badges58 silver badges77 bronze badges

Sign up to request clarification or add additional context in comments.

13 Comments

user2357112 Over a year ago

It is. I recommend r.T for transpose, though.

Maxime Chéramy Over a year ago

@user2357112 is r.T the exact same thing as np.transpose(r) but shorter?

user2357112 Over a year ago

Effectively identical. There's a very slight difference for 1-d arrays, but you probably won't be using either T or transpose for 1-d arrays.

Reti43 Over a year ago

Since numpy.shuffle shuffles the rows, but taking the transpose of a mtrix, you effectively shuffle the columns. Then you transpose back.

user2357112 Over a year ago

@Matt: This is an in-place operation on a view of the original array. It does not create a new, shuffled array, so there's no need to transpose the result.

|

YXD · Accepted Answer · 2013-12-12 14:46:21Z

6

For a general axis you could follow the pattern:

>>> import numpy as np
>>> 
>>> a = np.array([[  1,  20, 100, 4],
...               [  2,  31, 401, 5],
...               [  8,  11, 108, 6]])
>>> 
>>> print a[:, np.random.permutation(a.shape[1])]
[[  4   1  20 100]
 [  5   2  31 401]
 [  6   8  11 108]]
>>> 
>>> print a[np.random.permutation(a.shape[0]), :]
[[  1  20 100   4]
 [  2  31 401   5]
 [  8  11 108   6]]
>>>

answered Dec 12, 2013 at 14:46

YXD

32.6k15 gold badges79 silver badges117 bronze badges

1 Comment

RMurphy Over a year ago

Something is wrong with the code, shuffling the columns should not return the original matrix in general.

David Parks · Accepted Answer · 2018-02-10 17:52:36Z

5

So, one step further from your answer:

Edit: I very easily could be mistaken how this is working, so I'm inserting my understanding of the state of the matrix at each step.

r == 1 2 3
     4 5 6
     6 7 8

r = np.transpose(r)  

r == 1 4 6
     2 5 7
     3 6 8           # Columns are now rows

np.random.shuffle(r)

r == 2 5 7
     3 6 8 
     1 4 6           # Columns-as-rows are shuffled

r = np.transpose(r)  

r == 2 3 1
     5 6 4
     7 8 6           # Columns are columns again, shuffled.

which would then be back in the proper shape, with the columns rearranged.

The transpose of the transpose of a matrix == that matrix, or, [A^T]^T == A. So, you'd need to do a second transpose after the shuffle (because a transpose is not a shuffle) in order for it to be in its proper shape again.

Edit: The OP's answer skips storing the transpositions and instead lets the shuffle operate on r as if it were.

edited Feb 10, 2018 at 17:52

David Parks

32.4k48 gold badges206 silver badges366 bronze badges

answered Dec 12, 2013 at 14:53

Matthew

2172 silver badges11 bronze badges

5 Comments

Maxime Chéramy Over a year ago

np.random.shuffle does not return the array.

Matthew Over a year ago

So I see, edited. Regardless, the final step is needed to return your matrix to its original shape.

user2357112 Over a year ago

@Matt: No, no it's not. transpose returns a view of the original array. Once you shuffle the transposed array, the original is shuffled in the desired manner. There is no need to transpose twice.

Matthew Over a year ago

@user2357112 I added a sample matrix to each step to illustrate my thought pattern. It's been a decade since my last linear class, but I'm pretty sure this is what the documentation for np.tranpose and np.random.shuffle indicate is going on.

Matthew Over a year ago

@user2357112 read your comment to the question, got it now, thanks.

patapouf_ai · Accepted Answer · 2017-01-26 15:18:21Z

2

In general if you want to shuffle a numpy array along axis i:

def shuffle(x, axis = 0):
    n_axis = len(x.shape)
    t = np.arange(n_axis)
    t[0] = axis
    t[axis] = 0
    xt = np.transpose(x.copy(), t)
    np.random.shuffle(xt)
    shuffled_x = np.transpose(xt, t)
    return shuffled_x

shuffle(array, axis=i)

answered Jan 26, 2017 at 15:18

patapouf_ai

18.9k14 gold badges98 silver badges136 bronze badges

Comments

Sandip Saha · Accepted Answer · 2018-07-11 05:03:51Z

2

>>> print(s0)
>>> [[0. 1. 0. 1.]
     [0. 1. 0. 0.]
     [0. 1. 0. 1.]
     [0. 0. 0. 1.]]
>>> print(np.random.permutation(s0.T).T)
>>> [[1. 0. 1. 0.]
     [0. 0. 1. 0.]
     [1. 0. 1. 0.]
     [1. 0. 0. 0.]]

np.random.permutation(), does the row permutation.

answered Jul 11, 2018 at 5:03

Sandip Saha

212 bronze badges

Comments

Ataxias · Accepted Answer · 2021-03-23 05:36:48Z

2

There is another way, which does not use transposition and is apparently faster:

np.take(r, np.random.permutation(r.shape[1]), axis=1, out=r)

CPU times: user 1.14 ms, sys: 1.03 ms, total: 2.17 ms. Wall time: 3.89 ms

The approach in other answers: np.random.shuffle(r.T)

CPU times: user 2.24 ms, sys: 0 ns, total: 2.24 ms Wall time: 5.08 ms

I used r = np.arange(64*1000).reshape(64, 1000) as an input.

edited Mar 23, 2021 at 5:36

answered Mar 23, 2021 at 5:29

Ataxias

1,20315 silver badges24 bronze badges

4 Comments

Maxime Chéramy Over a year ago

That's a very interesting approach! I recommend you to use timeit to prove that it's really faster. And according to my tests, it seems to be the case!

Maxime Chéramy Over a year ago

r[:, np.random.permutation(r.shape[1])] seems to be even faster than using np.take

Ataxias Over a year ago

@MaximeChéramy I used %%time because timeit was caching the intermediate results and it wasn't clear what was happening. r = r[:, np.random.permutation(r.shape[1])] is nice! But I have the impression that np.take with out=r uses less memory. Can you use %memit to check?

Ataxias Over a year ago

Based on my timing experiments, in fact r[:, np.random.permutation(r.shape[1])] does not appear to be faster than np.take. Most likely the speed-up that you see is because of caching.

isarandi · Accepted Answer · 2023-10-26 19:45:17Z

0

numpy.random.Generator.shuffle has an axis parameter. This will shuffle in place:

rng = np.random.default_rng()
rng.shuffle(arr, axis=1)

https://numpy.org/doc/stable/reference/random/generated/numpy.random.Generator.shuffle.html

answered Oct 26, 2023 at 19:45

isarandi

3,3981 gold badge29 silver badges37 bronze badges

Collectives™ on Stack Overflow

Shuffle columns of an array with Numpy

7 Answers 7

13 Comments

1 Comment

5 Comments

Comments

Comments

4 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

7 Answers 7

13 Comments

1 Comment

5 Comments

Comments

Comments

4 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related