0

Let's say that I have a numpy array consisting of 100.000 zeroes and 10.000.000 ones.

How does one split/merge this array into a new array, where there are equally many ones and zeroes?

UPDATE

The goal is to take 100.000 zeroes and 100.000 ones from the big array, and create a new array where 50% of the array is zeroes and the other 50% are ones.

8
  • Can you be more specific about your goal? My first reaction to your question as posed is "just make a new array with the shape and number of ones and zeros that you want" Commented Dec 10, 2018 at 13:13
  • 1
    You can' do that, because 100.000 != 10.000.000 Commented Dec 10, 2018 at 13:14
  • @johnpaton Sorry, yes, the goal is to take equally many values from the huge array I have now, and split it into a new array, where the amount of zeroes and ones are exactly the same. Commented Dec 10, 2018 at 13:14
  • @handras I am quite aware of that. I want to take 100.000 zeroes and 100.000 ones and make that an array of its own. Commented Dec 10, 2018 at 13:15
  • 1
    All 1s are the same, so there's no need to take anything from the existing array. You can just do np.hstack([np.zeros(100000),np.ones(100000)]) Commented Dec 10, 2018 at 13:23

2 Answers 2

2

From the comments I take it you need the indices of all the zeros and a random 100'000 ones.

# make example
>>> A = np.repeat((0,1), (10**5, 10**7))
>>> np.random.shuffle(A)

# convert to bool
>>> m = A.astype(bool)
# put an additional 100'000 zeros ...
>>> B = np.repeat((False, True), (10**5, 10**7 - 10**5))
>>> np.random.shuffle(B)
# ... at positions that used to be one
>>> m[m] = B
# and get the indices of zeros
>>> idx, = np.where(~m)

# check
>>> idx
array([       1,       22,      180, ..., 10099911, 10099950, 10099969])
>>> len(idx)
200000
>>> A[idx]
array([0, 1, 1, ..., 1, 1, 0])
>>> A[idx].sum()
100000
Sign up to request clarification or add additional context in comments.

Comments

0

If I understand correctly, you only need the minimum length of both arrays, N.

Once you have it, you don't need to touch the original arrays, you can simply create a new one and shuffle it this way:

import numpy as np
N = 10
a = np.concatenate((np.ones(N), np.zeros(N)))
np.random.shuffle(a)

Here's an example in console:

>>> import numpy as np
>>> N = 10
>>> a = np.concatenate((np.ones(N), np.zeros(N)))
>>> a
array([1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0.])
>>> np.random.shuffle(a)
>>> a
array([0., 0., 1., 1., 1., 0., 1., 0., 0., 0., 0., 0., 1., 1., 0., 1., 1.,
       0., 1., 1.])
>>> len(a)
20
>>> sum(a)
10.0

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.