I need to do some intense numerical computations and fortunately python offers very simple ways to implement parallelisations. However, the results I got were totally weird and after some trial'n error I stumbled upon the problem.
The following code simply calculates the mean of a random sample of numbers but illustrates my problem:
import multiprocessing
import numpy as np
from numpy.random import random
# Define function to generate random number
def get_random(seed):
dummy = random(1000) * seed
return np.mean(dummy)
# Input data
input_data = [100,100,100,100]
pool = multiprocessing.Pool(processes=4)
result = pool.map(get_random, input_data)
print result
for i in input_data:
print get_random(i)
Now the output looks like this:
[51.003368466729405, 51.003368466729405, 51.003368466729405, 51.003368466729405]
for the parallelisation, which is always the same
and like this for the normal not parallelised loop:
50.8581749381
49.2887091049
50.83585841
49.3067281055
As you can see, the parallelisation just returns the same results, even though it should have calculated difference means just as the loop. Now, sometimes I get only 3 equal numbers with one being different from the other 3.
I suspect that some memory is allocated to all sub processes... I would love some hints on what is going on here and what a fix would look like. :)
thanks