Do I need a multi-core CPU to take advantage of the Python multiprocessing module? Also, can someone tell me how it works under the hood?
1 Answer
multiprocessing asks the OS to launch one or more new processes, running the same version of Python and the same version of your script. It can also set up pipes or other ways of sharing data directly between them.
It usually works like magic; when you peek under the hood, sometimes it looks like sausage being made, but you can usually understand the sausage grinders. The multiprocessing docs do a great job explaining things further (they're long, but then there's a lot to explain). And if you need even more under-the-hood knowledge, the docs link to the source, which is pretty readable Python code. If you have a specific question after reading, come back to SO and ask a specific question.
Meanwhile, you can get some of the benefits of multiprocessing without multiple cores.
The main benefit—the reason the module was designed—is parallelism for speed. And obviously, without 4 cores, you aren't going to cut your time down to 25%. But sometimes, you actually can get a bit of speedup even with a single core, especially if that core has "hyperthreading" or similar technologies. I've seen times come down to 80%, or even 60%. More commonly, they'll go up to 108% instead (because you did get a small benefit from hyperthreading, but the overhead cost was higher than the gain). But try it with your code and see.
Meanwhile, you get all of the side benefits:
- Concurrency: You can run multiple tasks at once without them blocking each other. Of course threads,
asyncio, and other techniques can do this too. - Isolation: You can run multiple tasks at once without the risk of one of them changing data that another one wasn't expecting to change.
- Crash protection: If a child task segfaults, only that task is affected. (Well, you still have to be careful of any side-effects—if it crashed in the middle of writing a file that another tasks expects to be in a consistent shape, you're still in trouble.)
You can also use the multiprocessing module without multiple processes. Sometimes you just want the higher-level API of the module, but you want to use it with threads; multiprocessing.dummy does that. And you can switch back and forth in a couple lines of code to test it both ways. Or you can use the higher-level concurrent.futures.ProcessPoolExecutor wrapper, if its model fits what you want to do. Besides often being simpler, it lets you switch between threads and processes by just changing one word in one line.
Also, redesigning your program around multiprocessing takes you a step closer to further redesigning it as a distributed system that runs on multiple separate machines. It forces you to deal with questions like how your tasks communicate without being able to share everything, without forcing you to deal with further questions like how they communicate without reliable connections.