4

I want to randomly produce an array of n ones and m zeros.

I thought of this solution:

  1. produce the ones array (np.ones)
  2. produce the zeros array (np.zeros)
  3. combine them to one array (np.hstack)
  4. shuffle the resulting array (np.random.shuffle)

Seems to be not natural as a solution. Some pythonic ideas?

4
  • Do you want an array of exactly n ones and m zeros, or an array of n+m elements that on average will have n ones and m zeros? Commented Nov 11, 2014 at 20:01
  • exactly n ones and m zeros Commented Nov 11, 2014 at 20:02
  • 1
    By the way, you probably want to use np.random.shuffle, not random.shuffle. Commented Nov 11, 2014 at 20:02
  • Your solution looks perfectly fine and Pythonic to me, if you want exact numbers of ones and zeros. Commented Nov 11, 2014 at 20:06

4 Answers 4

9

Your solution seems reasonable. It states exactly what it's doing, and does it clearly.

Let's compare your implementation:

a = np.hstack((np.ones(n), np.zeros(m)))
np.random.shuffle(a)

… with an obvious alternative:

a = np.ones(n+m)
a[:m] = 0
np.random.shuffle(a)

That might save a bit of time not allocating and moving hunks of data around, but it takes a bit more thought to understand.

And doing it in Python instead of in NumPy:

a = np.array([1]*n + [0]*m)
np.random.shuffle(a)

… might be a little more concise, but it seems less idiomatically NumPy (in the same way that np.array([1]*n) is less idiomatic than np.ones(n)), and it's going to be slower and use more memory for no good reason. (You could improve the memory by using np.fromiter, but then it's pretty clearly not going to be more concise.)

Of course if you're doing this more than once, the real answer is to factor it out into a function. Then the function's name will explain what it does, and almost any solution that isn't too tortured will be pretty easy to understand…

Sign up to request clarification or add additional context in comments.

Comments

1

I'd make an array of n ones and m zeros as

a = np.array([1] * n + [0] * m)

Then I'd call np.random.shuffle() on it.

5 Comments

Why would you do this? It's going to be slower, and use more memory, and it's hard to see a compensating benefit.
It's most easily understandable to me (and, so I assume, to coworkers). That's the only thing that counts until memory or speed becomes a problem and this part of the code turns out to be the bottleneck.
I didn't know about np.random.shuffle, I'll edit it to use that.
What's hard to understand about np.hstack(np.ones(n), np.zeros(m))? That says what it does directly, in NumPy terms. Would you say that np.array([1] * n) is more readable than np.ones(n)?
Creating lists only to convert them to arrays later is slow for large sizes, better to use np.ones and np.zeros and concatenate them together, exactly as the OP suggested in his question.
0

Use numpy.random.permutation:

a = numpy.random.permutation([1] * n + [0] * m)

or, using arrays instead of an initial list:

a = numpy.random.permutation(numpy.concatenate(np.ones(n), np.zeros(m)))

(I don't know enough about numpy to comment on the difference between concatenate and hstack; they seem to produce the same results here.)

4 Comments

Why use permutation to make a shuffled copy instead of shuffling it in-place? I can see a benefit in making the whole thing an expression, and in pure Python code I'd probably write something like this, but in NumPy code, it doesn't seem as idiomatic.
Shuffling in-place may indeed be better. I just noticed that permutation seemed to be the numpy "equivalent" of pure Python sorted (or, rather, of random.shuffled, the hypothetical counterpart to random.shuffle.)
Yeah, it's a bit strange that np.random has an equivalent of shuffled, where you don't often want it, but the stdlib's random doesn't have one, where you often would want it…
As for the difference between concatenate and hstack: if you don't pass an axis argument, and you've got 1D arrays, there's no difference at all; it's just a matter of which one you find more readable for a given problem. I think I would have chosen concatenate here, but since the OP chose hstack, I figured better to stick with that.
0

I think your solution is suitable, in that it's readable and pythonic. You didn't say whether memory or performance are considerations. It's possible that np.random.shuffle is as good as O(m + n), but the other answers suggest that it does more than a single pass to shuffle the values. You could do it in O(m + n) with only a single pass and no memory overhead like this:

import random
m = 600 # zeros
n = 400 # ones

result = []
while m + n > 0:
    if (m > 0 and random.random() < float(m)/float(m + n)):
        result.append(0)
        m -= 1
    else:
        result.append(1)
        n -= 1

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.