Is it possible to deal in python with shared memory parallel tasks? My task should be parallel over several cores (though threading module does not fit here, as far as I know the only facilities to do that is multiprocessing). There are lots of tasks for which I create a thread (in a python case process) pool. Then I need to initialize this threads (processes) with a lot of data from the main thread (process). Threads process this data and return new ones (again a lot of) to the main thread (process). What I see a huge overhead is that each task must copy the data to the new process in case of processes and do that same after finishing. But in case of threads this is eliminated. And it should be a huge speed up. Can I achieve this speed up with python?
2 Answers
Yes, the multiprocessing module does support placing objects in shared memory. See the documentation.
7 Comments
itun
It is nice, do you know is it going to work as a shared memory for all the platform (and not like a fake sharing, copy of memory which are synchronized or memory again copied to new place which can be accessed by all the participants)? Because on Windows, for example, the processes are created from scratch (no os.fork on Windows) and I dont really remember that one process can access memory of another process.
NPE
@itun: AFAIK,
Value and Array work as advertised on Windows (i.e. using true shared memory).Padraic Cunningham
I think one difference is when using a class vs instance attribute, in windows you need to use an instance because it does not support fork
itun
I think there is still a problem here as I told to @user4815162342. This special shared memory is a new place to/from which I need to copy my data from/to worker and main processes. It is like double copying (to shared memory, from shared), instead of from main to worker or from worker to main. It is even worse solution.
user4815162342
Neither answer proposes copying the data anywhere - you are expected to keep it in shared memory (be it mmap, sqlite or multiprocessing-based) to begin with.
|
Threads won't help you due to the GIL serializing them. However, you can still use multiprocessing and share the data between processes using the mmap module or equivalent. This will require the data to be structured to be readable from the file, so you won't be able to use, say, dictionaries - but using a file-based storage such as sqlite will be fine.
3 Comments
itun
But I see that this is a still memory copying involved. I need to copy memory from main process to this shared place and from it to worker, then from worker to this shared place and from shared place to main. It is really bad, it is more cheap to use sockets than, to copy directly to and from worker and main.
itun
I need to directly access the main memory.
user4815162342
@itun The idea presented in the answer is to use the shared space directly, without copying, and access it directly (including modifications). Otherwise the answer would offer no improvement over a naïve use of multiprocessing.