python queue concurrency process management

Question

The use case is as follows : I have a script that runs a series of non-python executables to reduce (pulsar) data. I right now use subprocess.Popen(..., shell=True) and then the communicate function of subprocess to capture the standard out and standard error from the non-python executables and the captured output I log using the python logging module.

The problem is: just one core of the possible 8 get used now most of the time. I want to spawn out multiple processes each doing a part of the data set in parallel and I want to keep track of progres. It is a script / program to analyze data from a low frequencey radio telescope (LOFAR). The easier to install / manage and test the better. I was about to build code to manage all this but im sure it must already exist in some easy library form.

"runs a series of non-python executables" All at the same time? Or serially? Please include a snippet of working code to explain what you're doing. — S.Lott
– S.Lott, Commented Nov 3, 2010 at 11:07

Thomas Wouters · Accepted Answer · 2010-11-03 11:10:15Z

2

The subprocess module can start multiple processes for you just fine, and keep track of them. The problem, though, is reading the output from each process without blocking any other processes. Depending on the platform there's several ways of doing this: using the select module to see which process has data to be read, setting the output pipes non-blocking using the fnctl module, using threads to read each process's data (which subprocess.Popen.communicate itself uses on Windows, because it doesn't have the other two options.) In each case the devil is in the details, though.

Something that handles all this for you is Twisted, which can spawn as many processes as you want, and can call your callbacks with the data they produce (as well as other situations.)

answered Nov 3, 2010 at 11:10

Thomas Wouters

134k23 gold badges153 silver badges123 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Marcelo Cantos · Accepted Answer · 2010-11-03 11:02:59Z

2

Maybe Celery will serve your needs.

answered Nov 3, 2010 at 11:02

Marcelo Cantos

187k40 gold badges338 silver badges366 bronze badges

Comments

seandavi · Accepted Answer · 2010-11-03 11:32:22Z

0

If I understand correctly what you are doing, I might suggest a slightly different approach. Try establishing a single unit of work as a function and then layer on the parallel processing after that. For example:

Wrap the current functionality (calling subprocess and capturing output) into a single function. Have the function create a result object that can be returned; alternatively, the function could write out to files as you see fit.
Create an iterable (list, etc.) that contains an input for each chunk of data for step 1.
Create a multiprocessing Pool and then capitalize on its map() functionality to execute your function from step 1 for each of the items in step 2. See the python multiprocessing docs for details.

You could also use a worker/Queue model. The key, I think, is to encapsulate the current subprocess/output capture stuff into a function that does the work for a single chunk of data (whatever that is). Layering on the parallel processing piece is then quite straightforward using any of several techniques, only a couple of which were described here.

answered Nov 3, 2010 at 11:32

seandavi

2,9884 gold badges31 silver badges54 bronze badges

2 Comments

Stephan Over a year ago

The problem is the code has to run on a cluster computers which have python 2.5 and the multiprocessing module is in python since 2.6. :/

seandavi Over a year ago

It is nearly trivial to install python in your home directory and use that one instead. That said, if you are on a cluster, then you might have other options including batch submission to a queuing system.

Collectives™ on Stack Overflow

python queue concurrency process management

3 Answers 3

Comments

Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related