2

I need to do some intense numerical computations and fortunately python offers very simple ways to implement parallelisations. However, the results I got were totally weird and after some trial'n error I stumbled upon the problem.

The following code simply calculates the mean of a random sample of numbers but illustrates my problem:

import multiprocessing
import numpy as np
from numpy.random import random

# Define function to generate random number
def get_random(seed):
    dummy = random(1000) * seed
    return np.mean(dummy)

# Input data
input_data = [100,100,100,100]

pool = multiprocessing.Pool(processes=4)
result = pool.map(get_random, input_data)
print result 

for i in input_data:
    print get_random(i)

Now the output looks like this:

[51.003368466729405, 51.003368466729405, 51.003368466729405, 51.003368466729405]

for the parallelisation, which is always the same

and like this for the normal not parallelised loop:

50.8581749381
49.2887091049
50.83585841
49.3067281055

As you can see, the parallelisation just returns the same results, even though it should have calculated difference means just as the loop. Now, sometimes I get only 3 equal numbers with one being different from the other 3.

I suspect that some memory is allocated to all sub processes... I would love some hints on what is going on here and what a fix would look like. :)

thanks

2 Answers 2

2

When you use multiprocessing, you're talking about distinct processes. Distinct processes means distinct Python interpreters. Distinct interpreters means distinct random states. If you aren't seeding the random number generator uniquely on each process, then you're going to get the same starting random state from each process.

Sign up to request clarification or add additional context in comments.

3 Comments

Hmm. Good to know, thanks, but unfortunately I can't see a way to implement this for my routine as I use random number generators multiple times with different instances. Is there perhaps a different way to solve this issue?
Prior to computing random numbers in parallel, distribute seeds to each process, using numpy.random.seed. This requires getting unique seeds, which could be as simple as generating some random numbers on your main process, or just enumerating 1 to n, the number of workers in your pool.
Ahh! Indeed. If I simply put np.random.seed() in the get_random function I do get different results. Perfect and thanks a lot!!
0

The answer was to put a new random seed into each process. Changing the function to

def get_random(seed):
    np.random.seed()
    dummy = random(1000) * seed
    return np.mean(dummy)

gives the wanted results. 😊

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.