I have written a small python program to see if I understand how global variables are transmitted to "child" processes.
import time
import random
shared_var = range(12)
def f(x):
global shared_var
time.sleep(1+random.random())
shared_var[x] = 100
print x, multiprocessing.current_process(), shared_var
return x*x
if __name__ == '__main__':
pool = multiprocessing.Pool(4)
results = pool.map(f, range(8))
print results
print shared_var
When I run it I get
3 <Process(PoolWorker-4, started daemon)> [0, 1, 2, 100, 4, 5, 6, 7, 8, 9, 10, 11]
0 <Process(PoolWorker-1, started daemon)> [100, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
2 <Process(PoolWorker-3, started daemon)> [0, 1, 100, 3, 4, 5, 6, 7, 8, 9, 10, 11]
1 <Process(PoolWorker-2, started daemon)> [0, 100, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
4 <Process(PoolWorker-4, started daemon)> [0, 1, 2, 100, 100, 5, 6, 7, 8, 9, 10, 11]
5 <Process(PoolWorker-1, started daemon)> [100, 1, 2, 3, 4, 100, 6, 7, 8, 9, 10, 11]
6 <Process(PoolWorker-3, started daemon)> [0, 1, 100, 3, 4, 5, 100, 7, 8, 9, 10, 11]
7 <Process(PoolWorker-2, started daemon)> [0, 100, 2, 3, 4, 5, 6, 100, 8, 9, 10, 11]
[0, 1, 4, 9, 16, 25, 36, 49]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
This is logical, since the child processes modify the global variable and, hence the copy-on-write mechanism makes that when a child process modifies a global variable, it is copied and hence any change is only visible in the spawned process.
My surprise was when I modified the code to print the identifiers of the variables:
import multiprocessing
import time
import random
shared_var = range(12)
def f(x):
global shared_var
time.sleep(1+random.random())
shared_var[x] = 100
print x, multiprocessing.current_process(), shared_var, id(shared_var)
return x*x
if __name__ == '__main__':
pool = multiprocessing.Pool(4)
results = pool.map(f, range(8))
print results
print shared_var, id(shared_var)
And got:
3 <Process(PoolWorker-4, started daemon)> [0, 1, 2, 100, 4, 5, 6, 7, 8, 9, 10, 11] 4504973968
0 <Process(PoolWorker-1, started daemon)> [100, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11] 4504973968
1 <Process(PoolWorker-2, started daemon)> [0, 100, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11] 4504973968
2 <Process(PoolWorker-3, started daemon)> [0, 1, 100, 3, 4, 5, 6, 7, 8, 9, 10, 11] 4504973968
6 <Process(PoolWorker-2, started daemon)> [0, 100, 2, 3, 4, 5, 100, 7, 8, 9, 10, 11] 4504973968
7 <Process(PoolWorker-3, started daemon)> [0, 1, 100, 3, 4, 5, 6, 100, 8, 9, 10, 11] 4504973968
4 <Process(PoolWorker-4, started daemon)> [0, 1, 2, 100, 100, 5, 6, 7, 8, 9, 10, 11] 4504973968
5 <Process(PoolWorker-1, started daemon)> [100, 1, 2, 3, 4, 100, 6, 7, 8, 9, 10, 11] 4504973968
[0, 1, 4, 9, 16, 25, 36, 49]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11] 4504973968
The identifiers of all the variables (in the main thread and in the spawned processes) are the same, while I expected a copy for each of the processes...
Does anyone know why I got these results? Also some references to how multiprocessing deals with global variables being read/written by created Processes would be great. Thanks!