3

I have something like the following:

d = {...} #a dictionary with strings
l1 = [...] #a list with stuff
l2 = [...] #a list with numbers

...

for i in l1:
    for key in l2:
        #do some stuff
        ...
        if d[key] == i:
            print d[key]

and I would like to do the same using threads (for performance improvement). I am thinking in something like:

import threading

d = {...} #a dictionary with strings
l1 = [...] #a list with stuff
l2 = [...] #a list with numbers

...

def test(i, key):
    #do the same stuff
    if d[key] == i:
        print d[j]

for i in l1:
    for key in l2:
        threading.start_new_thread(test, (i,key))

I'm not sure wether this is the best approach. My greates fear is that I am not optimizing at all. Some fundamental ideas are:

  • d should be in shared memory (it can be accessed by all threads). I assume that no thread will access the same entry.
  • Every (i, key) combinations should be being tested at the same time.

If in your opinion I should use another language, I would be glad if you could point it out. Help would be vey apreciated. Thanks in advance.

2
  • Threading in traditional Python implementations doesn't allow parallelism (GIL - Global Interpreter Lock) so you are probably not optimizing at all. Commented Jun 8, 2013 at 22:28
  • So what programming language would you recommend? Commented Jun 8, 2013 at 22:31

2 Answers 2

9

Traditional threading in Python (http://docs.python.org/2/library/threading.html) is limited in most common runtimes by a "Global Interpreter Lock" (GIL), which prevents multiple threads from executing simultaneously regardless of however many cores or CPUs you have. Despite this limitation, traditional threading is still extremely valuable when your threads are I/O bound, such as processing network connections or executing DB queries, in which most of the time they are waiting for external events rather than "computing".

If your individual processes are CPU bound, such as implied by your question, you will have better luck with the newer "multiprocessing" module (http://docs.python.org/2/library/multiprocessing.html):

multiprocessing is a package that supports spawning processes using an API similar to the threading module. The multiprocessing package offers both local and remote concurrency, effectively side-stepping the Global Interpreter Lock by using subprocesses instead of threads. Due to this, the multiprocessing module allows the programmer to fully leverage multiple processors on a given machine.

Sign up to request clarification or add additional context in comments.

2 Comments

Nice answer. I often see people saying 'don't bother using threads' when talking about Python but as you rightly point out for I/O bound operations it can still yield good perf benefits.
+1 for multiprocessing module, it is very useful and quite easy to use.
0

Your second code does nothing, since the return value of test gets discarded. Did you mean to keep print d[j]?

Unless test(i, j) is actually more complicated than you make it out to be, you are definitely not optimizing anything, as starting up a thread will take longer than accessing a dictionary. You might do better with this:

def test(i):
    for j in l2:
        if d[j] == i:
            print d[j]

for i in l1:
    threading.start_new_thread(test, (i,))

In general a few threads can improve performance, hundreds of threads just adds overhead.

The global interpreter lock does not necessarily make threading useless for increasing performance in Python. Many standard library functions will release the global interpreter lock while they do their heavy lifting. For this simple of an example, probably there is no parallelism though.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.