5

New to python and I want to do parallel programming in the following code, and want to use multiprocessing in python to do it. So how to modify the code? I've been searching method by using Pool, but found limited examples that I can follow. Anyone can help me? Thank you.

Note that setinner and setouter are two independent functions and that's where I want to use parallel programming to reduce the running time.

def solve(Q,G,n):
    i = 0
    tol = 10**-4

    while i < 1000:

        inneropt,partition,x = setinner(Q,G,n)
        outeropt = setouter(Q,G,n)

        if (outeropt - inneropt)/(1 + abs(outeropt) + abs(inneropt)) < tol:
            break

        node1 = partition[0]
        node2 = partition[1]

        G = updateGraph(G,node1,node2)
        if i == 999:
            print "Maximum iteration reaches"
    print inneropt
3
  • 3
    That code is not indented correctly, and will have sytax errors. Can you correct your code? Commented Dec 12, 2013 at 23:27
  • 1
    As a side note, it looks like this code might be using some matrix or graph library that might be built as a C extension module and might run slow operations without holding the GIL (like NumPy) so you can just use threading instead of multiprocessing, or even do automatic data parallelism when relevant (like some parts of SciPy or KDT, if you have the prerequisites configured and installed right), in which case multiprocessing would actually slow you down… Commented Dec 13, 2013 at 0:00
  • What is the relative running time of the 3 functions (setinnner, setouter, updateGraph)? Are Q,G,n objects large? Do you update almost all their values (I assume they are compound objects) or only small part on each iteration? Commented Dec 13, 2013 at 0:57

1 Answer 1

1

It's hard to parallelize code that needs to mutate the same shared data from different tasks. So, I'm going to assume that setinner and setouter are non-mutating functions; if that's not true, things will be more complicated.

The first step is to decide what you want to do in parallel.


One obvious thing is to do the setinner and setouter at the same time. They're completely independent of each other, and always need to both get done. So, that's what I'll do. Instead of doing this:

inneropt,partition,x = setinner(Q,G,n)
outeropt = setouter(Q,G,n)

… we want to submit the two functions as tasks to the pool, then wait for both to be done, then get the results of both.

The concurrent.futures module (which requires a third-party backport in Python 2.x) makes it easier to do things like "wait for both to be done" than the multiprocessing module (which is in the stdlib in 2.6+), but in this case, we don't need anything fancy; if one of them finishes early, we don't have anything to do until the other finishes anyway. So, let's stick with multiprocessing.apply_async:

pool = multiprocessing.Pool(2) # we never have more than 2 tasks to run
while i < 1000:
    # parallelly start both tasks
    inner_result = pool.apply_async(setinner, (Q, G, n))
    outer_result = pool.apply_async(setouter, (Q, G, n))

    # sequentially wait for both tasks to finish and get their results
    inneropt,partition,x = inner_result.get()
    outeropt = outer_result.get()

    # the rest of your loop is unchanged

You may want to move the pool outside the function so it lives forever and can be used by other parts of your code. And if not, you almost certainly want to shut the pool down at the end of the function. (Later versions of multiprocessing let you just use the pool in a with statement, but I think that requires Python 3.2+, so you have to do it explicitly.)


What if you want to do more work in parallel? Well, there's nothing else obvious to do here without restructuring the loop. You can't do updateGraph until you get the results back from setinner and setouter, and nothing else is slow here.

But if you could reorganize things so that each loop's setinner were independent of everything that came before (which may or may not be possible with your algorithm—without knowing what you're doing, I can't guess), you could push 2000 tasks onto the queue up front, then loop by just grabbing results as needed. For example:

pool = multiprocessing.Pool() # let it default to the number of cores
inner_results = []
outer_results = []
for _ in range(1000):
    inner_results.append(pool.apply_async(setinner, (Q,G,n,i))
    outer_results.append(pool.apply_async(setouter, (Q,G,n,i))
while i < 1000:
    inneropt,partition,x = inner_results.pop(0).get()
    outeropt = outer_results.pop(0).get()
    # result of your loop is the same as before

Of course you can make this fancier.

For example, let's say you rarely need more than a couple hundred iterations, so it's wasteful to always compute 1000 of them. You can just push the first N at startup, and push one more every time through the loop (or N more every N times) so you never do more than N wasted iterations—you can't get an ideal tradeoff between perfect parallelism and minimal waste, but you can usually tune it pretty nicely.

Also, if the tasks don't actually take that long, but you have a lot of them, you may want to batch them up. One really easy way to do this is to use one of the map variants instead of apply_async; this can make your fetching code a tiny bit more complicated, but it makes the queuing and batching code completely trivial (e.g., to map each func over a list of 100 parameters with a chunksize of 10 is just two simple lines of code).

Sign up to request clarification or add additional context in comments.

4 Comments

For some clearly defined usecases like this, of setting two functions to run out-of-process asyncronusly, the lelo helper package can be of use - (in this case, it probably would work simply decorating setinner and setoutter): pypi.python.org/pypi/lelo/1.0rc3dev
@jsbueno: That looks pretty cool. I worry about automatic lazy objects in Python, because it's not actually possible to override every useful expression in Python, so you can end up with minval < lazy_result not actually evaluating the result. (In, say, C++, that's not a problem, because (a) among the things you can overload are assignment, copying, and conversion to a T&, and (b) static typing often forces strictness in ways Python can't.) But it will obviously work in the OP's case, because he immediately unpacks one, and passes the other to abs
@jsbueno: Actually, on second thought… in the OP's case, unpacking the first result into three variables is obviously going to block, which means he'd need to rewrite his code the same way as in my answer (do the operations, then get the results) to actually get any parallelism. Although just reversing the two calls would be a quick&dirty fix for that.
@abanert - I disagree about not being possible to override every usefull expression. The package "lelo" as it is may contain errors - but to make use of a value in Python one have to go through one of the "dunder" (__NAME__) methods. Pure assignemnt is not overridable, but them all an assigment does is to bind the lazy object itself to another name. When it gets to be printed, compared, serialized, its value will have to be fetched.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.