34

I am trying to do some plotting in parallel to finish large batch jobs quicker. To this end, I start a thread for each plot I plan on making.

I had hoped that each thread would finish its plotting and close itself (as I understand it, Python closes threads when they get through all the statements in run()). Below is some code that shows this behavior.

If the line that creates a figure is commented out, it runs as expected. Another plausibly helpful tidbit is that it also runs as expected when you only spawn one thread.

import matplotlib.pyplot as plt
import time
import Queue
import threading

def TapHistplots():
    ##  for item in ['str1']:
# # it behaves as expected if the line above is used instead of the one below
    for item in ['str1','str2']:
        otheritem = 1
        TapHistQueue.put((item, otheritem))
        makeTapHist().start()

class makeTapHist(threading.Thread):
    def run(self):
        item, otheritem = TapHistQueue.get()
        fig = FigureQueue.get()
        FigureQueue.put(fig+1)
        print item+':'+str(fig)+'\n',
        time.sleep(1.3)
        plt.figure(fig) # comment out this line and it behaves as expected
        plt.close(fig)

TapHistQueue = Queue.Queue(0)
FigureQueue = Queue.Queue(0)
def main():
    start = time.time()
    """Code in here runs only when this module is run directly"""
    FigureQueue.put(1)
    TapHistplots()
    while threading.activeCount()>1:
        time.sleep(1)
        print 'waiting on %d threads\n' % (threading.activeCount()-1),
    print '%ds elapsed' % (time.time()-start)

if __name__ == '__main__':
    main()

Any help is duly appreciated.

3
  • 3
    You've not actually said what goes wrong, although it sounds like some sort of thread concurrency issue. Commented Jan 11, 2011 at 16:19
  • I'm not actually sure what goes wrong. I do not get any errors and a python process continues running. Also, the print statement in the main thread that should be going off every second doesn't do so after the first second. A look in the task manager shows that the process continues to use a lot of the cpu. I have limited experience in serious debug, unfortunately. Commented Jan 11, 2011 at 16:42
  • Is it your intention to call makeTapHist().start() multiple times? Looks like maybe it should be outside the loop. Commented Jan 20, 2016 at 18:22

2 Answers 2

35

Why not just use multiprocessing? As far as I can tell from your description, threading won't help you much, anyway...

Matplotlib already threads so that you can display and interact with multiple figures at once. If you want to speed up batch processing on a multicore machine, you're going to need multiprocessing regardless.

As a basic example (Warning: This will create 20 small .png files in whatever directory you run it in!)

import multiprocessing
import matplotlib.pyplot as plt
import numpy as np

def main():
    pool = multiprocessing.Pool()
    num_figs = 20
    input = zip(np.random.randint(10,1000,num_figs), 
                range(num_figs))
    pool.map(plot, input)

def plot(args):
    num, i = args
    fig = plt.figure()
    data = np.random.randn(num).cumsum()
    plt.plot(data)
    plt.title('Plot of a %i-element brownian noise sequence' % num)
    fig.savefig('temp_fig_%02i.png' % i)

main()
Sign up to request clarification or add additional context in comments.

5 Comments

In addition multiprocessing version is super-fast compared to threading version
I just tried this script, on python 3.5.3, and it gets stuck... any help?
On windows you need to protect the main() call via if __name__=='__main__': main()
I called the solution from a flask API, it solved the concurrency issue. However, the created processes does not exit, which cause RAM shortage. It this normal since it is not a production server?
if you are using it in an always running server, for example flask, these two calls also should be added, in order to completely close the created processes: pool.close() and pool.join()
6

For pylab interface there is a solution Asynchronous plotting with threads.

Without pylab there could be different solutions for each matplotlib's backends (Qt, GTK, WX, Tk). The problem is that each GUI toolkit has each own GUI mainloop. You could see how ipython deals with it.

3 Comments

Insofar as I can tell, the link provided shows how to work with a single figure from many threads not how to make plots in parallel. As I understood it, the backends were crucial to take into account when using matplotlib interactively (like ipython does). I'd appreciate it if you could explain how they apply to this example.
@Boris: backend does matter e.g., ideone.com/J42rn produces segmentation fault with default backend.
That link seems to be dead now..?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.