Evenly distributed random strings in Python

Question

I want to generate random strings (or arrays) of 1's and 0's. Then I classify them according to quantity (count) of 1's. I want the generated strings to be evenly distributed among the possible counts.

But the following code gives me a normal distribution:

import numpy as np

for i in range(num_examples):
    seq = np.random.randint(2, size=(seq_length)).astype('float32')
    sequences[i] = seq

target_classes = []
for input in sequences: 
    target = (input == 1).sum()
    target_classes.append(target)

The histogram of counts is:

A NumPy solution would be awesome. Or do I need regular expressions or something else?

"So I want the generated strings to be evenly distributed in these classes" - why do you expect them to be evenly distributed in these classes? That's like expecting 0 heads to be as likely as 50 heads in a sequence of 100 coin flips. — user2357112
– user2357112, Commented Aug 10, 2017 at 22:37
This seems more of a mathematical or statistical problem. Once you know the math that produces the distribution you want, converting it to Python should be straightforward. — Barmar
– Barmar, Commented Aug 10, 2017 at 22:37
OP didn't say he expects them to be evenly distributed, but that he wants them evenly distributed. — Prune
– Prune, Commented Aug 10, 2017 at 22:38
@juanpa.arrivillaga string or array, not a problem. I can convert one to another. — Sait Banazili
– Sait Banazili, Commented Aug 10, 2017 at 22:40

MSeifert · Accepted Answer · 2017-08-10 23:08:40Z

As @Prune already noted this is essentially a 2-step process. First you need to create an uniform distribution of the "number of ones" (for example with np.random.randint), then you need to set that many "seq" elements to one (for example using np.random.choice).

One possibility would be:

import numpy as np

NUM_EXAMPLES = 10000
SEQ_LENGTH = 10

sequences = np.zeros((NUM_EXAMPLES, SEQ_LENGTH), dtype=np.int8)
# How many number of ones in each sequence
number_of_1s = np.random.randint(0, SEQ_LENGTH+1, size=NUM_EXAMPLES)

indices = np.arange(SEQ_LENGTH)
for idx, num_ones in enumerate(number_of_1s.tolist()):
    # Set "num_ones" elements to 1 using "choice" without replace.
    sequences[idx][np.random.choice(indices, num_ones, replace=False)] = 1

Using a histogram shows that it seems to be quite evenly distributed:

plt.hist(np.sum(sequences==1, axis=1), bins=np.arange(SEQ_LENGTH+2)-0.5, histtype='step')

Prune · Accepted Answer · 2017-08-10 22:37:54Z

2

If you want equal distribution of the quantity of 1's, then I think you'll find it easiest to first generate the quantity, and then to randomly distribute that many 1's through the binary representation. This is a two-step process, almost by necessity.

With that hint, can you do the coding on your own?

answered Aug 10, 2017 at 22:37

Prune

78k14 gold badges63 silver badges83 bronze badges

3 Comments

Prune Over a year ago

No, one random call to get the quantity, then something to distribute that many 1s in a string of 0s. I don't see a role for regex in this.

Sait Banazili Over a year ago

I think I don't even need the random call. Just an integer array from 0 to (num_of_seq / len_of_seq). I will try to implement "something" part. Thanks.

Prune Over a year ago

Ah ... if you want to guarantee equal distribution, yes -- you do not need the first random call. I was thinking of a uniform random distribution.

Collectives™ on Stack Overflow

Evenly distributed random strings in Python

2 Answers 2

Comments

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related