32

I wish to run several instances of a simulation in parallel, but with each simulation having its own independent data set.

Currently I implement this as follows:

P = mp.Pool(ncpus) # Generate pool of workers
for j in range(nrun): # Generate processes
    sim = MDF.Simulation(tstep, temp, time, writeout, boundaryxy, boundaryz, relax, insert, lat,savetemp)
    lattice = MDF.Lattice(tstep, temp, time, writeout, boundaryxy, boundaryz, relax, insert, lat, kb, ks, kbs, a, p, q, massL, randinit, initvel, parangle,scaletemp,savetemp)
    adatom1 = MDF.Adatom(tstep, temp, time, writeout, boundaryxy, boundaryz, relax, insert, lat, ra, massa, amorse, bmorse, r0, z0, name, lattice, samplerate,savetemp)        
    P.apply_async(run,(j,sim,lattice,adatom1),callback=After) # run simulation and ISF analysis in each process
P.close()
P.join() # start processes  

where sim, adatom1 and lattice are objects passed to the function run which initiates the simulation.

However, I recently found out that each batch I run simultaneously (that is, each ncpus runs out of the total nrun of simulations runs) gives the exact same results.

Can someone here enlighten how to fix this?

10
  • How do you obtain the results? Commented Feb 9, 2012 at 10:49
  • 1
    Do you get different results if you replace apply_async with a direct call to After(run(j,sim,lattice,adatom1))? Commented Feb 9, 2012 at 11:20
  • 2
    Solved i think. Per an advice here [link] (stackoverflow.com/questions/6914240/…) I added scipy.random.seed in the calling function 'run'. Commented Feb 9, 2012 at 13:34
  • 5
    Do not put "solved" in the question or in a comment. Please put an Answer that explains the solution. Do not add comments with critical details. Please update the question to include all the facts. Commented Feb 9, 2012 at 13:52
  • 1
    @MickeyDiamant can you post some code one how you solved it? An answer with actual would be super helpful. Commented Apr 5, 2017 at 2:53

3 Answers 3

33

Just thought I would add an actual answer to make it clear for others.

Quoting the answer from aix in this question:

What happens is that on Unix every worker process inherits the same state of the random number generator from the parent process. This is why they generate identical pseudo-random sequences.

Use the random.seed() method (or the scipy/numpy equivalent) to set the seed properly. See also this numpy thread.

Sign up to request clarification or add additional context in comments.

2 Comments

does this guarantee that any library using random numbers with each new process will have correctly started with a new random number? Or do we need to set the random number for each library separately?
I believe that this answer actually depends on the method by which the new processes are created ("spawn", "fork", or "forkserver"). If you are using "fork" (the default), then yes, the worker process inherits parent states. If you are using "spawn" then everything is "remade" and the random number generator will be in its default mode instead of copying from the parent (unless you explicitly tell it to re-use the same seed).
7

This is an unsolved problem. Try to generate a unique seed for each process. You can add below code to beginning of your function to overcome the issue.

np.random.seed((os.getpid() * int(time.time())) % 123456789)

2 Comments

Is the os.getpid() unique for every process? Because the processes/workers(?) are created at similar moments; is there not a change that with this creates processes which will be using the same seeds?
Yes, each process has a unique pid (Process ID) no matter how they are created. On the other hand, threads in the same process have same pid, of course.
1

A solution for the problem was to use scipy.random.seed() in the function run which assign a new seed for random functions called from run.

A similar problem (from which i obtained the solution) can be found in multiprocessing.Pool seems to work in Windows but not in ubuntu?

2 Comments

Is there no way to set the random number for every process that might use random numbers? Say one uses the module random, numpy, scipy, tensorflow and who knows what else. Is the only way to make sure the process has a different random seed to go through each of these and manually set the state?
you can pass seed number to each process as an input argument if you don't like to set them manually. eg: pool.map(func, seedlist) and in func: def func(myseed): np.random.seed(myseed)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.