Python threading questions

Question

I have something like the following:

d = {...} #a dictionary with strings
l1 = [...] #a list with stuff
l2 = [...] #a list with numbers

...

for i in l1:
    for key in l2:
        #do some stuff
        ...
        if d[key] == i:
            print d[key]

and I would like to do the same using threads (for performance improvement). I am thinking in something like:

import threading

d = {...} #a dictionary with strings
l1 = [...] #a list with stuff
l2 = [...] #a list with numbers

...

def test(i, key):
    #do the same stuff
    if d[key] == i:
        print d[j]

for i in l1:
    for key in l2:
        threading.start_new_thread(test, (i,key))

I'm not sure wether this is the best approach. My greates fear is that I am not optimizing at all. Some fundamental ideas are:

d should be in shared memory (it can be accessed by all threads). I assume that no thread will access the same entry.
Every (i, key) combinations should be being tested at the same time.

If in your opinion I should use another language, I would be glad if you could point it out. Help would be vey apreciated. Thanks in advance.

Threading in traditional Python implementations doesn't allow parallelism (GIL - Global Interpreter Lock) so you are probably not optimizing at all. — zch
– zch, Commented Jun 8, 2013 at 22:28

dkamins · Accepted Answer · 2013-06-10 02:08:22Z

9

Traditional threading in Python (http://docs.python.org/2/library/threading.html) is limited in most common runtimes by a "Global Interpreter Lock" (GIL), which prevents multiple threads from executing simultaneously regardless of however many cores or CPUs you have. Despite this limitation, traditional threading is still extremely valuable when your threads are I/O bound, such as processing network connections or executing DB queries, in which most of the time they are waiting for external events rather than "computing".

If your individual processes are CPU bound, such as implied by your question, you will have better luck with the newer "multiprocessing" module (http://docs.python.org/2/library/multiprocessing.html):

multiprocessing is a package that supports spawning processes using an API similar to the threading module. The multiprocessing package offers both local and remote concurrency, effectively side-stepping the Global Interpreter Lock by using subprocesses instead of threads. Due to this, the multiprocessing module allows the programmer to fully leverage multiple processors on a given machine.

edited Jun 10, 2013 at 2:08

answered Jun 8, 2013 at 22:36

dkamins

22.1k7 gold badges58 silver badges62 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

zebrabox Over a year ago

Nice answer. I often see people saying 'don't bother using threads' when talking about Python but as you rightly point out for I/O bound operations it can still yield good perf benefits.

Bitwise Over a year ago

+1 for multiprocessing module, it is very useful and quite easy to use.

morningstar · Accepted Answer · 2013-06-08 22:44:41Z

Your second code does nothing, since the return value of test gets discarded. Did you mean to keep print d[j]?

Unless test(i, j) is actually more complicated than you make it out to be, you are definitely not optimizing anything, as starting up a thread will take longer than accessing a dictionary. You might do better with this:

def test(i):
    for j in l2:
        if d[j] == i:
            print d[j]

for i in l1:
    threading.start_new_thread(test, (i,))

In general a few threads can improve performance, hundreds of threads just adds overhead.

The global interpreter lock does not necessarily make threading useless for increasing performance in Python. Many standard library functions will release the global interpreter lock while they do their heavy lifting. For this simple of an example, probably there is no parallelism though.

Collectives™ on Stack Overflow

Python threading questions

2 Answers 2

2 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related