1

I have written a small python program to see if I understand how global variables are transmitted to "child" processes.

import time
import random

shared_var = range(12)

def f(x):
    global shared_var
    time.sleep(1+random.random())
    shared_var[x] = 100
    print x, multiprocessing.current_process(), shared_var
    return x*x

if __name__ == '__main__':
    pool = multiprocessing.Pool(4)
    results = pool.map(f, range(8))
    print results
    print shared_var

When I run it I get

3 <Process(PoolWorker-4, started daemon)> [0, 1, 2, 100, 4, 5, 6, 7, 8, 9, 10, 11]
0 <Process(PoolWorker-1, started daemon)> [100, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
2 <Process(PoolWorker-3, started daemon)> [0, 1, 100, 3, 4, 5, 6, 7, 8, 9, 10, 11]
1 <Process(PoolWorker-2, started daemon)> [0, 100, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
4 <Process(PoolWorker-4, started daemon)> [0, 1, 2, 100, 100, 5, 6, 7, 8, 9, 10, 11]
5 <Process(PoolWorker-1, started daemon)> [100, 1, 2, 3, 4, 100, 6, 7, 8, 9, 10, 11]
6 <Process(PoolWorker-3, started daemon)> [0, 1, 100, 3, 4, 5, 100, 7, 8, 9, 10, 11]
7 <Process(PoolWorker-2, started daemon)> [0, 100, 2, 3, 4, 5, 6, 100, 8, 9, 10, 11]
[0, 1, 4, 9, 16, 25, 36, 49]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]

This is logical, since the child processes modify the global variable and, hence the copy-on-write mechanism makes that when a child process modifies a global variable, it is copied and hence any change is only visible in the spawned process.

My surprise was when I modified the code to print the identifiers of the variables:

import multiprocessing
import time
import random

shared_var = range(12)

def f(x):
    global shared_var
    time.sleep(1+random.random())
    shared_var[x] = 100
    print x, multiprocessing.current_process(), shared_var, id(shared_var)
    return x*x

if __name__ == '__main__':
    pool = multiprocessing.Pool(4)
    results = pool.map(f, range(8))
    print results
    print shared_var, id(shared_var)

And got:

3 <Process(PoolWorker-4, started daemon)> [0, 1, 2, 100, 4, 5, 6, 7, 8, 9, 10, 11] 4504973968
0 <Process(PoolWorker-1, started daemon)> [100, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11] 4504973968
1 <Process(PoolWorker-2, started daemon)> [0, 100, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11] 4504973968
2 <Process(PoolWorker-3, started daemon)> [0, 1, 100, 3, 4, 5, 6, 7, 8, 9, 10, 11] 4504973968
6 <Process(PoolWorker-2, started daemon)> [0, 100, 2, 3, 4, 5, 100, 7, 8, 9, 10, 11] 4504973968
7 <Process(PoolWorker-3, started daemon)> [0, 1, 100, 3, 4, 5, 6, 100, 8, 9, 10, 11] 4504973968
4 <Process(PoolWorker-4, started daemon)> [0, 1, 2, 100, 100, 5, 6, 7, 8, 9, 10, 11] 4504973968
5 <Process(PoolWorker-1, started daemon)> [100, 1, 2, 3, 4, 100, 6, 7, 8, 9, 10, 11] 4504973968
[0, 1, 4, 9, 16, 25, 36, 49]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11] 4504973968

The identifiers of all the variables (in the main thread and in the spawned processes) are the same, while I expected a copy for each of the processes...

Does anyone know why I got these results? Also some references to how multiprocessing deals with global variables being read/written by created Processes would be great. Thanks!

2 Answers 2

1

I think there's some confusion about the memory. You don't use multithreading, but multiprocessing, so each worker runs in a separate process, having its own virtual memory space. Therefore, each process has an own copy of shared_var from the very beginning. This is what gets modified in each call to f(x), leaving the actual variable in __main__ unaffected.

You can check the docs for the chapter on sharing memory between processes e.g. using multiprocessing.Array.

I'm not 100% sure why the address stays the same, but I think that since each new subprocess is spawned by forking the main process and copying its memory layout, the addresses in the virtual memory remain the same for each of the children. The physical memory address is of course different. That's why you see the same id, but different values.

Sign up to request clarification or add additional context in comments.

2 Comments

Does it mean that the variables are always copied to the children processes? I thought it was only copied when being modified? Would the same happen for a custom class? In that case... Which is the "copy" method being called?
Oh jeez, so many questions in a single comment :) 1. correct 2. I don't know the internals of forking, but the result is the same, whether you copy immiediately, or on first write 3. Python is very agnostic w.r.t. custom vs non-custom classes, everything's an object; so my guess is: yes. 4. I don't understand the question. This is not C++, there is no notion of copy constructors. Feel free to post a seprate question about memory management in Python, maybe more knowledgable people than myself will provide more information - this is an exciting topic!
0

As you may know the id(x) in CPython is actually accessing the memory address of an object.

Pleace check https://superuser.com/questions/347765/is-virtual-memory-related-to-virtual-address-space-of-a-process and Why Virtual Memory Address is the same in different process?. Basically n operating system arranges virtual memory address to each of the process, the process has no idea about the actual (physical) memory address of an object.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.