Adding randomization to numpy function array_split

Question

Let's propose that we have an array arr and we want to divide the array into pieces saving the order of elements. It can be easily done using np.array_split:

import numpy
arr = np.array([0,1,2,3,4,5,6,7,8])
pieces = 3
np.array_split(arr,pieces)
>>> [array([0, 1, 2]), array([3, 4, 5]), array([6, 7, 8])]

If arr.size % pieces != 0 the output of np.array_split will be uneven:

arr = np.array([0,1,2,3,4,5,6,7])
pieces = 3
np.array_split(arr,pieces)
>>> [array([0, 1, 2]), array([3, 4, 5]), array([6, 7])]

I am wondering what is the best way to add randomization to the procedure to get the following outputs with equal probability:

>>> [array([0, 1]), array([2, 3, 4]), array([5, 6, 7])]
>>> [array([0, 1, 2]), array([3, 4]), array([5, 6, 7])]
>>> [array([0, 1, 2]), array([3, 4, 5]), array([6, 7])]

I am interested in generalized solution which will also work for other combinations of array size and number of pieces, for example:

arr = np.array([0,1,2,3,4,5,6,7,8,9])
pieces = 6

You can give array_split a list of indices instead of just one number. Then it's up to you to figure out the different splits, e.g. np.array_split(x,[2,5]) [3,5], [3,6] — hpaulj
– hpaulj, Commented Nov 14, 2022 at 17:58

w-m · Accepted Answer · 2022-11-14 19:44:20Z

1

def random_arr_split(arr, n):
    # NumPy doc: For an array of length l that should be split into n sections,
    # it returns l % n sub-arrays of size l//n + 1 and the rest of size l//n
    piece_lens = [arr.size // n + 1] * (arr.size % n) + [arr.size // n] * (n - arr.size % n)
    piece_lens_shuffled = np.random.permutation(piece_lens)
    
    # drop the last element, which is the end of the array
    # otherwise getting an empty array at the end
    split_indices = np.cumsum(piece_lens_shuffled)[:-1]
    return np.array_split(arr, split_indices)

answered Nov 14, 2022 at 19:44

w-m

11.3k1 gold badge46 silver badges51 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Adding randomization to numpy function array_split

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related