Using multi-threading to process an image faster on python?

Question

On a Python + Python Image Library script, there's a function called processPixel(image,pos) that calculates a mathematical index in function of an image and a position on it. This index is computed for each pixel using a simple for loop:

for x in range(image.size[0)):
    for y in range(image.size[1)):
        myIndex[x,y] = processPixel(image,[x,y])

This is taking too much time. How could threading be implemented to split the work speeding itup? How faster could a multi-threaded code be? Specifially, is this defined by the number of processor cores?

Also, I'm be very willing to bet that processPixel could be "numpy-ified", in which case you'll see an immense speedup over your current approach. — Joe Kington
– Joe Kington, Commented Jan 11, 2012 at 15:23

Björn Pollex · Accepted Answer · 2012-01-10 19:31:04Z

7

You cannot speed it up using threading due to the Global Interpreter Lock. Certain internal state of the Python interpreter is protected by that lock, which prevents different threads that need to modify that state from running concurrently.

You could speed it up by spawning actual processes using multiprocessing. Each process will run in its own interpreter, thus circumventing the limitation of threads. With multiprocessing, you can either use shared memory, or give each process its own copy/partition of the data.

Depending on your task, you can either parallelize the processing of a single image by partitioning it, or you can parallelize the processing of a list of images (the latter is easily done using a pool). If you want to use the former, you might want to store the image in an Array that can be accessed as shared memory, but you'd still have to solve the problem of where to write the results (writing to shared memory can hurt performance badly). Also note that certain kinds of communication between processes (Queues, Pipes, or the parameter/return-value passing of some function in the module) require the data to be serialized using Pickle. This imposes certain limitations on the data, and might create significant performance-overhead (especially if you have many small tasks).

Another way for improving performance of such operations is to try writing them in Cython, which has its own support for parallelization using OpenMP - I have never used that though, so I don't know how much help it can be.

edited Jan 10, 2012 at 19:31

answered Jan 10, 2012 at 12:04

Björn Pollex

77.1k30 gold badges206 silver badges290 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

freakish Over a year ago

If you are processing images (or your doing any operation requiring a lot of computing power) then you should also look at GPU. Python surely supports it.

aculich Over a year ago

As @freakish suggests, you should use GPU-based solutions for this kind of problem. What you said about the GIL and multiprocessing is correct, but still won't help for image processing. And when it comes to array processing I recommend using NumPy since it was designed for efficient array handling.

Guy · Accepted Answer · 2020-01-13 15:08:44Z

1

Take a look at Doug Hellmans tutorial on multiprocessing. As Björn points out, there are various issues regarding parallel processing which you'll need to get a hang of, but it can really be worth the effort.

Tip: You can use multiprocessing.cpu_count() to check the number of cores available to you.

edited Jan 13, 2020 at 15:08

answered Jan 10, 2012 at 12:12

Guy

3794 silver badges10 bronze badges

1 Comment

Monica Heddneck Over a year ago

linkrot on the tutorial

Glorfindel · Accepted Answer · 2023-03-06 19:00:47Z

Here are a list of library that you'll want to explore for doing efficient image processing:

OpenCV - is a library of programming functions for real time computer vision and image manipulation that contains Python bindings.

PyOpenCL lets you access GPUs and other massively parallel compute devices from Python.

PyCUDA is a sister project to PyOpenCL

NumPy and SciPy are fundamental packages for doing scientific computing that may be helpful with the above packages for doing efficient image and array processing.

Also note that for doing image processing the multiprocessing library that some people suggest is not going to help you efficiently handle image processing, so you should avoid using operating system threads to do this. If for some reason you do need coarse-grained parallelism, then you can use python library for MPI, but you probably want to stick with GPU-based libraries.

Collectives™ on Stack Overflow

Using multi-threading to process an image faster on python?

3 Answers 3

2 Comments

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

2 Comments

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related