Using python multiprocessing with different random seed for each process

Question

I wish to run several instances of a simulation in parallel, but with each simulation having its own independent data set.

Currently I implement this as follows:

P = mp.Pool(ncpus) # Generate pool of workers
for j in range(nrun): # Generate processes
    sim = MDF.Simulation(tstep, temp, time, writeout, boundaryxy, boundaryz, relax, insert, lat,savetemp)
    lattice = MDF.Lattice(tstep, temp, time, writeout, boundaryxy, boundaryz, relax, insert, lat, kb, ks, kbs, a, p, q, massL, randinit, initvel, parangle,scaletemp,savetemp)
    adatom1 = MDF.Adatom(tstep, temp, time, writeout, boundaryxy, boundaryz, relax, insert, lat, ra, massa, amorse, bmorse, r0, z0, name, lattice, samplerate,savetemp)        
    P.apply_async(run,(j,sim,lattice,adatom1),callback=After) # run simulation and ISF analysis in each process
P.close()
P.join() # start processes

where sim, adatom1 and lattice are objects passed to the function run which initiates the simulation.

However, I recently found out that each batch I run simultaneously (that is, each ncpus runs out of the total nrun of simulations runs) gives the exact same results.

Can someone here enlighten how to fix this?

Do you get different results if you replace apply_async with a direct call to After(run(j,sim,lattice,adatom1))? — Janne Karila
– Janne Karila, Commented Feb 9, 2012 at 11:20
Solved i think. Per an advice here [link] (stackoverflow.com/questions/6914240/…) I added scipy.random.seed in the calling function 'run'. — Mickey Diamant
– Mickey Diamant, Commented Feb 9, 2012 at 13:34
Do not put "solved" in the question or in a comment. Please put an Answer that explains the solution. Do not add comments with critical details. Please update the question to include all the facts. — S.Lott
– S.Lott, Commented Feb 9, 2012 at 13:52
@MickeyDiamant can you post some code one how you solved it? An answer with actual would be super helpful. — Charlie Parker
– Charlie Parker, Commented Apr 5, 2017 at 2:53

Community · Accepted Answer · 2017-05-23 12:32:16Z

33

Just thought I would add an actual answer to make it clear for others.

Quoting the answer from aix in this question:

What happens is that on Unix every worker process inherits the same state of the random number generator from the parent process. This is why they generate identical pseudo-random sequences.

Use the random.seed() method (or the scipy/numpy equivalent) to set the seed properly. See also this numpy thread.

edited May 23, 2017 at 12:32

CommunityBot

11 silver badge

answered Feb 24, 2012 at 16:03

dgorissen

6,3253 gold badges48 silver badges55 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Charlie Parker Over a year ago

does this guarantee that any library using random numbers with each new process will have correctly started with a new random number? Or do we need to set the random number for each library separately?

Mandias Over a year ago

I believe that this answer actually depends on the method by which the new processes are created ("spawn", "fork", or "forkserver"). If you are using "fork" (the default), then yes, the worker process inherits parent states. If you are using "spawn" then everything is "remade" and the random number generator will be in its default mode instead of copying from the parent (unless you explicitly tell it to re-use the same seed).

alercelik · Accepted Answer · 2021-09-01 10:40:31Z

7

This is an unsolved problem. Try to generate a unique seed for each process. You can add below code to beginning of your function to overcome the issue.

np.random.seed((os.getpid() * int(time.time())) % 123456789)

answered Sep 1, 2021 at 10:40

alercelik

7511 gold badge9 silver badges11 bronze badges

2 Comments

HerChip Over a year ago

Is the os.getpid() unique for every process? Because the processes/workers(?) are created at similar moments; is there not a change that with this creates processes which will be using the same seeds?

alercelik Over a year ago

Yes, each process has a unique pid (Process ID) no matter how they are created. On the other hand, threads in the same process have same pid, of course.

2 revs · Accepted Answer · 2017-05-23 11:46:37Z

1

A solution for the problem was to use scipy.random.seed() in the function run which assign a new seed for random functions called from run.

A similar problem (from which i obtained the solution) can be found in multiprocessing.Pool seems to work in Windows but not in ubuntu?

edited May 23, 2017 at 11:46

community wiki

2 revs
skrrgwasme

2 Comments

Charlie Parker Over a year ago

Is there no way to set the random number for every process that might use random numbers? Say one uses the module random, numpy, scipy, tensorflow and who knows what else. Is the only way to make sure the process has a different random seed to go through each of these and manually set the state?

Maryam Hnr Over a year ago

you can pass seed number to each process as an input argument if you don't like to set them manually. eg: pool.map(func, seedlist) and in func: def func(myseed): np.random.seed(myseed)

Collectives™ on Stack Overflow

Using python multiprocessing with different random seed for each process

3 Answers 3

2 Comments

2 Comments

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

2 Comments

2 Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related