1

I know that using Python's random.choices I can do this:

import random


array_probabilities = [0.5 for _ in range(4)]
print(array_probabilities)  # [0.5, 0.5, 0.5, 0.5]

a = [random.choices([0, 1], weights=[1 - probability, probability])[0] for probability in array_probabilities]
print(a)  # [1, 1, 1, 0]

How to make an numpy array of 0 and 1 based on a probability array?

Using random.choices is fast, but I know numpy is even faster. I would like to know how to write the same code but using numpy. I'm just getting started with numpy and would appreciate your feedback.

1
  • Is this your answer : link Commented Oct 30, 2022 at 15:06

4 Answers 4

3

One option:

out = (np.random.random(size=len(array_probabilities)) > array_probabilities).astype(int)

Example output:

array([0, 1, 0, 1])
Sign up to request clarification or add additional context in comments.

4 Comments

It doesn't work. If you have array = [0.2, 0.3, 0.6, 0.9], the output should be [0, 0, 1, 1] on most iterations, but it always stays random.
Of course it's always random, but with weights. I don't see how this "doesn't work". Also the <0.5 probabilities are not linked. There is no reason they should have the same values.
Sorry, I was trying your solution and it gave me strange results, if p = [1, 1, 1, 1] the output was [0, 1, 0, 1] most of the times. But changing the > to < already works. Could you explain to me how that works?
How that works? Basically np.random.random() function generates a random number between 0 and 1 using uniform distribution. So everytime it generates a number x, the probability of x < c, where c is some number in array_probabilities vector = c. Hence effectively the probability of x < c holding True is c. Hope this helps.
2

Your question got me wondering so I wrote a basic function to compare their timings. And it seems you are right! Timings change but only a little. Here you can see the code below and the output.

import numpy as np
import time
import random
def stack_question():
    start=time.time()*1000
    array_probabilities = [0.5 for _ in range(4)]
    a = [random.choices([0, 1], weights=[1 - probability, probability])[0] for probability in array_probabilities]
    end=time.time()
    return (start-end)

def numpy_random_array():
    start_time=time.time()*1000
    val=np.random.rand(4,1)
    end_time=time.time()
    return (start_time-end_time)
print("List implementation  ",stack_question())

print("Array implementation  ",numpy_random_array())

The output:

List implementation   1665476650232.8433
Array implementation   1665476650233.9226

Edit: From geeks4geeks I found the following explanation of why it is faster to use numpy arrays.

NumPy Arrays are faster than Python Lists because of the following reasons:
An array is a collection of homogeneous data-types that are stored in contiguous memory locations. On the other hand, a list in Python is a collection of heterogeneous data types stored in non-contiguous memory locations. The NumPy package breaks down a task into multiple fragments and then processes all the fragments parallelly. The NumPy package integrates C, C++, and Fortran codes in Python. These programming languages have very little execution time compared to Python.

3 Comments

Also using numba if you have a gpu for those operations if could help if runtime is what you need!
Both are fast, but my understanding is that NumPy stays fast at larger amounts of data, while native Python doesn't.
@vanstrouble yeah you are right. I also found some explanation to this issue in geek4geeks, will update my post like that.
0
probabilities = np.random.rand(1,10)
bools_arr = np.apply_along_axis(lambda x: 1 if x > 0.5 else 0, 1, [probabilities])

2 Comments

In this case you are only evaluating that for it to be 1 it must be greater than 0.5, but it is not evaluated according to the probability criteria.
What do you mean by probability criteria?
0

Anwsering an old question ... This could be what you're looking for?

p1 = 0.5

np.random.choice([0,1], p=[1-p1, p1], size=4)

You could select de p Array in the way you want, for example p = [0.5 for _ in range(2)] the range must have the same len than values.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.