I am trying to learn to use Python's Multiprocessing module. To do so, I am trying to solve problems from ProjectEuler.net as they are good candidates for parallel processing. The code below is based on problem 14, which asks you to find the longest Collatz Chain below 1,000,000. My initial solution was to fling off 1,000,000 processes and try to solve them --that was a disaster. My second attempt (the solution below), breaks the problem down in to numProcs chunks.
My question is, are there ways to make this run better? Is there a way to do this without breaking the problem down in to chunks beforehand, i.e. is there a way to dynamically task Processors. In plain English, how do I say "OOoo! Processor 1 finished its problem, let's give it another." Rather than "Here is your total workload Processor 1."?
def run2(j):
"""
A multicore implementation of the collatz sequence
j - the largest number whose Collatz Chain we need to check
"""
from multiprocessing import Process, Value, cpu_count
import math
def bazinga(nums, maxChain, biggest):
"""
generates the length of the collatz chain for each
number in list 'nums', it determines the longest chain
and the number that determines the longest chain and
writes them to shared memory 'maxChain' and 'biggest'
"""
for i in nums:
a = len(generateChain(i, []))
if a >= maxChain.value:
biggest.value = i
maxChain.value = a
numProcs = cpu_count()
chunksize = math.ceil(j/numProcs) #Total number to solve / #cpus
maxChainSize = Value('I', 0)
largestChainProducer = Value('I', 0)
#generate the "chunks"
numList = []
for i in range(numProcs): numList.append([])
for i in range(1, j):
numList[i%numProcs].append(i) #allocate the numbers in to buckets
ps = []
for i in range(numProcs):
p = Process(target=bazinga, args=(numList[i], maxChainSize, largestChainProducer))
p.start()
ps.append(p)
for p in ps: p.join()
print('%s has a chain of length %s' % (largestChainProducer.value, maxChainSize.value))