How to make an numpy array of 0 and 1 based on a probability array?

Question

I know that using Python's random.choices I can do this:

import random


array_probabilities = [0.5 for _ in range(4)]
print(array_probabilities)  # [0.5, 0.5, 0.5, 0.5]

a = [random.choices([0, 1], weights=[1 - probability, probability])[0] for probability in array_probabilities]
print(a)  # [1, 1, 1, 0]

How to make an numpy array of 0 and 1 based on a probability array?

Using random.choices is fast, but I know numpy is even faster. I would like to know how to write the same code but using numpy. I'm just getting started with numpy and would appreciate your feedback.

Is this your answer : link

Mahdi F.
– Mahdi F.

2022-10-30 15:06:53 +00:00
Commented Oct 30, 2022 at 15:06 — Mahdi F.
– Mahdi F., Commented Oct 30, 2022 at 15:06

mozway · Accepted Answer · 2022-10-30 15:36:39Z

3

One option:

out = (np.random.random(size=len(array_probabilities)) > array_probabilities).astype(int)

Example output:

array([0, 1, 0, 1])

answered Oct 30, 2022 at 15:36

mozway

267k13 gold badges56 silver badges106 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

vanstrouble Over a year ago

It doesn't work. If you have array = [0.2, 0.3, 0.6, 0.9], the output should be [0, 0, 1, 1] on most iterations, but it always stays random.

mozway Over a year ago

Of course it's always random, but with weights. I don't see how this "doesn't work". Also the <0.5 probabilities are not linked. There is no reason they should have the same values.

vanstrouble Over a year ago

Sorry, I was trying your solution and it gave me strange results, if p = [1, 1, 1, 1] the output was [0, 1, 0, 1] most of the times. But changing the > to < already works. Could you explain to me how that works?

Saurabh Agrawal Over a year ago

How that works? Basically np.random.random() function generates a random number between 0 and 1 using uniform distribution. So everytime it generates a number x, the probability of x < c, where c is some number in array_probabilities vector = c. Hence effectively the probability of x < c holding True is c. Hope this helps.

Kutay Kılıç · Accepted Answer · 2022-10-31 05:55:41Z

2

Your question got me wondering so I wrote a basic function to compare their timings. And it seems you are right! Timings change but only a little. Here you can see the code below and the output.

import numpy as np
import time
import random
def stack_question():
    start=time.time()*1000
    array_probabilities = [0.5 for _ in range(4)]
    a = [random.choices([0, 1], weights=[1 - probability, probability])[0] for probability in array_probabilities]
    end=time.time()
    return (start-end)

def numpy_random_array():
    start_time=time.time()*1000
    val=np.random.rand(4,1)
    end_time=time.time()
    return (start_time-end_time)
print("List implementation  ",stack_question())

print("Array implementation  ",numpy_random_array())

The output:

List implementation   1665476650232.8433
Array implementation   1665476650233.9226

Edit: From geeks4geeks I found the following explanation of why it is faster to use numpy arrays.

NumPy Arrays are faster than Python Lists because of the following reasons:
An array is a collection of homogeneous data-types that are stored in contiguous memory locations. On the other hand, a list in Python is a collection of heterogeneous data types stored in non-contiguous memory locations. The NumPy package breaks down a task into multiple fragments and then processes all the fragments parallelly. The NumPy package integrates C, C++, and Fortran codes in Python. These programming languages have very little execution time compared to Python.

edited Oct 31, 2022 at 5:55

answered Oct 30, 2022 at 16:08

Kutay Kılıç

1096 bronze badges

3 Comments

Kutay Kılıç Over a year ago

Also using numba if you have a gpu for those operations if could help if runtime is what you need!

vanstrouble Over a year ago

Both are fast, but my understanding is that NumPy stays fast at larger amounts of data, while native Python doesn't.

Kutay Kılıç Over a year ago

@vanstrouble yeah you are right. I also found some explanation to this issue in geek4geeks, will update my post like that.

LLaP · Accepted Answer · 2022-10-30 17:17:56Z

0

probabilities = np.random.rand(1,10)
bools_arr = np.apply_along_axis(lambda x: 1 if x > 0.5 else 0, 1, [probabilities])

answered Oct 30, 2022 at 17:17

LLaP

2,7474 gold badges27 silver badges35 bronze badges

2 Comments

vanstrouble Over a year ago

In this case you are only evaluating that for it to be 1 it must be greater than 0.5, but it is not evaluated according to the probability criteria.

LLaP Over a year ago

What do you mean by probability criteria?

Perico · Accepted Answer · 2024-08-03 12:41:11Z

0

Anwsering an old question ... This could be what you're looking for?

p1 = 0.5

np.random.choice([0,1], p=[1-p1, p1], size=4)

You could select de p Array in the way you want, for example p = [0.5 for _ in range(2)] the range must have the same len than values.

answered Aug 3, 2024 at 12:41

Perico

2012 silver badges3 bronze badges

Collectives™ on Stack Overflow

How to make an numpy array of 0 and 1 based on a probability array?

4 Answers 4

4 Comments

3 Comments

2 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

4 Comments

3 Comments

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related