Optimizing Multiprocessing in Python

Question

I am trying to learn to use Python's Multiprocessing module. To do so, I am trying to solve problems from ProjectEuler.net as they are good candidates for parallel processing. The code below is based on problem 14, which asks you to find the longest Collatz Chain below 1,000,000. My initial solution was to fling off 1,000,000 processes and try to solve them --that was a disaster. My second attempt (the solution below), breaks the problem down in to numProcs chunks.

My question is, are there ways to make this run better? Is there a way to do this without breaking the problem down in to chunks beforehand, i.e. is there a way to dynamically task Processors. In plain English, how do I say "OOoo! Processor 1 finished its problem, let's give it another." Rather than "Here is your total workload Processor 1."?

def run2(j):
    """
    A multicore implementation of the collatz sequence
    j - the largest number whose Collatz Chain we need to check
    """
    from multiprocessing import Process, Value, cpu_count
    import math

    def bazinga(nums, maxChain, biggest):
        """
        generates the length of the collatz chain for each
        number in list 'nums', it determines the longest chain
        and the number that determines the longest chain and 
        writes them to shared memory 'maxChain' and 'biggest'
        """
        for i in nums:
            a = len(generateChain(i, []))
            if a >= maxChain.value:
                biggest.value = i
                maxChain.value = a   

    numProcs = cpu_count()
    chunksize = math.ceil(j/numProcs) #Total number to solve / #cpus

    maxChainSize = Value('I', 0)
    largestChainProducer = Value('I', 0)

    #generate the "chunks"
    numList = []
    for i in range(numProcs): numList.append([])
    for i in range(1, j):
        numList[i%numProcs].append(i) #allocate the numbers in to buckets

    ps = []
    for i in range(numProcs):
        p = Process(target=bazinga, args=(numList[i], maxChainSize, largestChainProducer))
        p.start()
        ps.append(p)
    for p in ps: p.join()

    print('%s has a chain of length %s' % (largestChainProducer.value, maxChainSize.value))

You might want to take a look at IPython's multiprocessing capabilities (ipython.org/ipython-doc/stable/parallel/index.html) it has much more feature such as automatic load balancing. — Dietrich
– Dietrich, Commented Feb 23, 2014 at 18:07

Will · Accepted Answer · 2014-02-23 18:11:55Z

1

Try using queues to pass jobs into the running processes and results back. Your running process will pop its next job off the queue when it is ready to do so.

There are some good examples in the python multiprocessing docs.

answered Feb 23, 2014 at 18:11

Will

1,55010 silver badges22 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Optimizing Multiprocessing in Python

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related