Python Random Array of 0s and 1s [duplicate]

Question

I want to randomly produce an array of n ones and m zeros.

I thought of this solution:

produce the ones array (np.ones)
produce the zeros array (np.zeros)
combine them to one array (np.hstack)
shuffle the resulting array (np.random.shuffle)

Seems to be not natural as a solution. Some pythonic ideas?

Do you want an array of exactly n ones and m zeros, or an array of n+m elements that on average will have n ones and m zeros? — abarnert
– abarnert, Commented Nov 11, 2014 at 20:01
By the way, you probably want to use np.random.shuffle, not random.shuffle. — abarnert
– abarnert, Commented Nov 11, 2014 at 20:02
Your solution looks perfectly fine and Pythonic to me, if you want exact numbers of ones and zeros. — Bas Swinckels
– Bas Swinckels, Commented Nov 11, 2014 at 20:06

abarnert · Accepted Answer · 2014-11-11 20:09:35Z

Your solution seems reasonable. It states exactly what it's doing, and does it clearly.

Let's compare your implementation:

a = np.hstack((np.ones(n), np.zeros(m)))
np.random.shuffle(a)

… with an obvious alternative:

a = np.ones(n+m)
a[:m] = 0
np.random.shuffle(a)

That might save a bit of time not allocating and moving hunks of data around, but it takes a bit more thought to understand.

And doing it in Python instead of in NumPy:

a = np.array([1]*n + [0]*m)
np.random.shuffle(a)

… might be a little more concise, but it seems less idiomatically NumPy (in the same way that np.array([1]*n) is less idiomatic than np.ones(n)), and it's going to be slower and use more memory for no good reason. (You could improve the memory by using np.fromiter, but then it's pretty clearly not going to be more concise.)

Of course if you're doing this more than once, the real answer is to factor it out into a function. Then the function's name will explain what it does, and almost any solution that isn't too tortured will be pretty easy to understand…

RemcoGerlich · Accepted Answer · 2014-11-11 20:07:01Z

1

I'd make an array of n ones and m zeros as

a = np.array([1] * n + [0] * m)

Then I'd call np.random.shuffle() on it.

edited Nov 11, 2014 at 20:07

answered Nov 11, 2014 at 20:01

RemcoGerlich

31.4k6 gold badges66 silver badges83 bronze badges

5 Comments

abarnert Over a year ago

Why would you do this? It's going to be slower, and use more memory, and it's hard to see a compensating benefit.

RemcoGerlich Over a year ago

It's most easily understandable to me (and, so I assume, to coworkers). That's the only thing that counts until memory or speed becomes a problem and this part of the code turns out to be the bottleneck.

RemcoGerlich Over a year ago

I didn't know about np.random.shuffle, I'll edit it to use that.

abarnert Over a year ago

What's hard to understand about np.hstack(np.ones(n), np.zeros(m))? That says what it does directly, in NumPy terms. Would you say that np.array([1] * n) is more readable than np.ones(n)?

Bas Swinckels Over a year ago

Creating lists only to convert them to arrays later is slow for large sizes, better to use np.ones and np.zeros and concatenate them together, exactly as the OP suggested in his question.

chepner · Accepted Answer · 2014-11-11 20:17:41Z

0

Use numpy.random.permutation:

a = numpy.random.permutation([1] * n + [0] * m)

or, using arrays instead of an initial list:

a = numpy.random.permutation(numpy.concatenate(np.ones(n), np.zeros(m)))

(I don't know enough about numpy to comment on the difference between concatenate and hstack; they seem to produce the same results here.)

edited Nov 11, 2014 at 20:17

answered Nov 11, 2014 at 20:06

chepner

538k77 gold badges594 silver badges746 bronze badges

4 Comments

abarnert Over a year ago

Why use permutation to make a shuffled copy instead of shuffling it in-place? I can see a benefit in making the whole thing an expression, and in pure Python code I'd probably write something like this, but in NumPy code, it doesn't seem as idiomatic.

chepner Over a year ago

Shuffling in-place may indeed be better. I just noticed that permutation seemed to be the numpy "equivalent" of pure Python sorted (or, rather, of random.shuffled, the hypothetical counterpart to random.shuffle.)

abarnert Over a year ago

Yeah, it's a bit strange that np.random has an equivalent of shuffled, where you don't often want it, but the stdlib's random doesn't have one, where you often would want it…

abarnert Over a year ago

As for the difference between concatenate and hstack: if you don't pass an axis argument, and you've got 1D arrays, there's no difference at all; it's just a matter of which one you find more readable for a given problem. I think I would have chosen concatenate here, but since the OP chose hstack, I figured better to stick with that.

brycem · Accepted Answer · 2014-11-11 23:43:51Z

0

I think your solution is suitable, in that it's readable and pythonic. You didn't say whether memory or performance are considerations. It's possible that np.random.shuffle is as good as O(m + n), but the other answers suggest that it does more than a single pass to shuffle the values. You could do it in O(m + n) with only a single pass and no memory overhead like this:

import random
m = 600 # zeros
n = 400 # ones

result = []
while m + n > 0:
    if (m > 0 and random.random() < float(m)/float(m + n)):
        result.append(0)
        m -= 1
    else:
        result.append(1)
        n -= 1

answered Nov 11, 2014 at 23:43

brycem

6034 silver badges9 bronze badges

Collectives™ on Stack Overflow

Python Random Array of 0s and 1s [duplicate]

4 Answers 4

Comments

5 Comments

4 Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Comments

5 Comments

4 Comments

Comments

Linked

Related