Merging one NumPy array into new NumPy array with equal amount of values

Question

Let's say that I have a numpy array consisting of 100.000 zeroes and 10.000.000 ones.

How does one split/merge this array into a new array, where there are equally many ones and zeroes?

UPDATE

The goal is to take 100.000 zeroes and 100.000 ones from the big array, and create a new array where 50% of the array is zeroes and the other 50% are ones.

Can you be more specific about your goal? My first reaction to your question as posed is "just make a new array with the shape and number of ones and zeros that you want" — johnpaton
– johnpaton, Commented Dec 10, 2018 at 13:13
@johnpaton Sorry, yes, the goal is to take equally many values from the huge array I have now, and split it into a new array, where the amount of zeroes and ones are exactly the same. — user10705497
– user10705497, Commented Dec 10, 2018 at 13:14
@handras I am quite aware of that. I want to take 100.000 zeroes and 100.000 ones and make that an array of its own. — user10705497
– user10705497, Commented Dec 10, 2018 at 13:15
All 1s are the same, so there's no need to take anything from the existing array. You can just do np.hstack([np.zeros(100000),np.ones(100000)]) — johnpaton
– johnpaton, Commented Dec 10, 2018 at 13:23

Paul Panzer · Accepted Answer · 2018-12-10 16:10:20Z

2

From the comments I take it you need the indices of all the zeros and a random 100'000 ones.

# make example
>>> A = np.repeat((0,1), (10**5, 10**7))
>>> np.random.shuffle(A)

# convert to bool
>>> m = A.astype(bool)
# put an additional 100'000 zeros ...
>>> B = np.repeat((False, True), (10**5, 10**7 - 10**5))
>>> np.random.shuffle(B)
# ... at positions that used to be one
>>> m[m] = B
# and get the indices of zeros
>>> idx, = np.where(~m)

# check
>>> idx
array([       1,       22,      180, ..., 10099911, 10099950, 10099969])
>>> len(idx)
200000
>>> A[idx]
array([0, 1, 1, ..., 1, 1, 0])
>>> A[idx].sum()
100000

answered Dec 10, 2018 at 16:10

Paul Panzer

53.3k3 gold badges59 silver badges103 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Eric Duminil · Accepted Answer · 2018-12-10 13:23:40Z

0

If I understand correctly, you only need the minimum length of both arrays, N.

Once you have it, you don't need to touch the original arrays, you can simply create a new one and shuffle it this way:

import numpy as np
N = 10
a = np.concatenate((np.ones(N), np.zeros(N)))
np.random.shuffle(a)

Here's an example in console:

>>> import numpy as np
>>> N = 10
>>> a = np.concatenate((np.ones(N), np.zeros(N)))
>>> a
array([1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0.])
>>> np.random.shuffle(a)
>>> a
array([0., 0., 1., 1., 1., 0., 1., 0., 0., 0., 0., 0., 1., 1., 0., 1., 1.,
       0., 1., 1.])
>>> len(a)
20
>>> sum(a)
10.0

answered Dec 10, 2018 at 13:23

Eric Duminil

54.6k10 gold badges80 silver badges134 bronze badges

Collectives™ on Stack Overflow

Merging one NumPy array into new NumPy array with equal amount of values

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related