1

Say I have two queues, call them colors and numbers:

colors = Queue.Queue()
numbers = Queue.Queue()

and they each contain several items:

for color in ['red', 'orange', 'yellow', 'green', 'blue', 'indago', 'violet']:
    colors.put(color)
for i in xrange(20):
    numbers.put(i)

and a function to handle a combination of a number and letter:

def handle():
    while not colors.empty()
        color = colors.get()
        number = numbers.get()
        print "Foo: %s bar: %d" % (color, number)
        colors.task_done()
        numbers.task_done()

that will be threaded:

children = []
for i in xrange(num_threads):
    children.append(threading.Thread(target=handle))

but instead of just printing each color and a number, I want to print all possible combinations of colors and numbers, what is the most efficient way to do this? Here is what I would like the output to look like: http://pastebin.com/yhksKswr

The problem (well feature I suppose) is that Queue.get() removes the item it returns from the queue so that each item is only used once.

2
  • Use itertools. The problem you describe seems independent of Queue. Commented Jun 14, 2014 at 21:44
  • @BrianCain the reason I am using Queue is so that I don't have to worry about thread safety. I know how to do this with lists, but not queues. Commented Jun 14, 2014 at 21:46

2 Answers 2

2

One approach you might take is to fill a single queue with tuples, each tuple holding a 'color' and a 'number'. In other words, generate the cartesian product of the two lists as an initial step, then hand them out to your threaded workers via the thread-safe Queue.

BTW, in my experience, parallelizing at the process level gets you more bang for your buck than threading in Python. You might try using Redis or Celery to distribute your jobs across many workers (executing on the same or different machines).

Sign up to request clarification or add additional context in comments.

Comments

1

It seems this may be too sequential and small to be threaded. The key to a problem being solved with a parallel algorithm is that there has to be a way to break it into subproblems of approximately equal size, each subproblem is reasonably computationally intensive (so as to make the overhead in creating a new thread worth doing), and that no solution to a previous subproblem is needed to solve another subproblem (because then one thread is left doing nothing waiting on another).

You're going to have to keep track of what the current color and number are and iterate over both of them, so it could look something like this:

for color in colors:
  for number in numbers:
    t = threading.Thread(target=make_combination, args=(color, number))
    t.run()

def make_combination(c, n):
  # make a combination

But since it takes so long to create a thread, you would be better off if you just called make_Combination in the loop.

If you really want to do it with threads, I would:

  1. Initialize the Queue with all the colors and numbers`.
  2. Create n threads.
  3. Let each thread get a color, copy the numbers Queue, then print the color with each number.
  4. Each thread repeats #3 until the color queue is empty.

So:

for color in ['red', 'orange', 'yellow', 'green', 'blue', 'indago', 'violet']:
    colors.put(color)
numbers = list(range(20)) # We won't be using it like a queue, so just make it a list.

for i in range(0, num_threads):
  threading.Thread(target=handle)

def handle():
  while not colors.empty():
    color = colors.get()
    for i in numbers:
      print color, i # Edit this to get it to print what you want

But it's important to not that this will almost never print in order.

And with multiprocessing.Pool:

# Initialize as lists
colors = [...]
numbers = [...]

def handle(c, n):
  # do something with c and n

p = multiprocessing.Pool(num_processes)
for c in colors:
  for n in numbers:
    p.apply_async(handle, (c, n)) # it's either this or "p.apply_async(handle, args = (c, n))". Can't remember.
    # The above basically means "call handle(c, n) in another process". There are ways to get the return value, too, if you want it. (See the docs about Pool and AsyncResult.)

p.close() # No more jobs to submit.
p.join() # Wait for jobs to finish.

8 Comments

Thanks for the answer. The actual program I'm writing does actually proform a more computational intensive process with each combination, I only used the colors and numbers to simplify the question. Also, your solution provides no way to control the number of threads created. Were there 1000 numbers it would attempt to create severl thousand threads.
Oh, well if you actually want concurrency, then you need multiprocessing. Threads in Python only run one at a time, because of the GIL. Multiple processes bypass the GIL and give you true SMP. It even has a Pool class which will handle the problem of limiting the number of processes to a given amount. Are you interested in seeing an example using Pool?
Never heard of that module before, but it sounds shiny, so yes!
AFAIK, it isn't in the library. I did a cursory Google search just out of curiosity. Regardless, if you want to actually use more than one processor core at a time, you need multiprocessing instead of threading anyway. If you use threads, you'll notice that they only use one core on your machine no matter how many threads you create. Processes, like those created with the multiprocessing module, will use as many cores as you want it too (bounded by how many you have, of course).
That isn't to say you couldn't write a ThreadPool class. But I don't see the point in it. Sorry to crush your idea, but I, like most people, see threads in Python as completely useless, because of the GIL (Google it). It would be great if they would fix the GIL, but it doesn't seem like that will ever happen.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.