Python parallel programming issue

Question

I need to do some intense numerical computations and fortunately python offers very simple ways to implement parallelisations. However, the results I got were totally weird and after some trial'n error I stumbled upon the problem.

The following code simply calculates the mean of a random sample of numbers but illustrates my problem:

import multiprocessing
import numpy as np
from numpy.random import random

# Define function to generate random number
def get_random(seed):
    dummy = random(1000) * seed
    return np.mean(dummy)

# Input data
input_data = [100,100,100,100]

pool = multiprocessing.Pool(processes=4)
result = pool.map(get_random, input_data)
print result 

for i in input_data:
    print get_random(i)

Now the output looks like this:

[51.003368466729405, 51.003368466729405, 51.003368466729405, 51.003368466729405]

for the parallelisation, which is always the same

and like this for the normal not parallelised loop:

50.8581749381
49.2887091049
50.83585841
49.3067281055

As you can see, the parallelisation just returns the same results, even though it should have calculated difference means just as the loop. Now, sometimes I get only 3 equal numbers with one being different from the other 3.

I suspect that some memory is allocated to all sub processes... I would love some hints on what is going on here and what a fix would look like. :)

thanks

mobiusklein · Accepted Answer · 2014-11-06 17:42:56Z

2

When you use multiprocessing, you're talking about distinct processes. Distinct processes means distinct Python interpreters. Distinct interpreters means distinct random states. If you aren't seeding the random number generator uniquely on each process, then you're going to get the same starting random state from each process.

answered Nov 6, 2014 at 17:42

mobiusklein

1,4239 silver badges13 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

HansSnah Over a year ago

Hmm. Good to know, thanks, but unfortunately I can't see a way to implement this for my routine as I use random number generators multiple times with different instances. Is there perhaps a different way to solve this issue?

mobiusklein Over a year ago

Prior to computing random numbers in parallel, distribute seeds to each process, using numpy.random.seed. This requires getting unique seeds, which could be as simple as generating some random numbers on your main process, or just enumerating 1 to n, the number of workers in your pool.

HansSnah Over a year ago

Ahh! Indeed. If I simply put np.random.seed() in the get_random function I do get different results. Perfect and thanks a lot!!

HansSnah · Accepted Answer · 2014-11-06 19:02:06Z

0

The answer was to put a new random seed into each process. Changing the function to

def get_random(seed):
    np.random.seed()
    dummy = random(1000) * seed
    return np.mean(dummy)

gives the wanted results. 😊

answered Nov 6, 2014 at 19:02

HansSnah

2,3005 gold badges23 silver badges35 bronze badges

Collectives™ on Stack Overflow

Python parallel programming issue

2 Answers 2

3 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related