Multi-threaded image processing with openCV in python

Question

I am quite new to python and have problems to parallize a part of my algorithm. Consider an input image thats need to be threshold in a certain way on pixel level. Since the algorithm only considers a specific area to calculate the threshold values, I'd like to run each chunk of the image in a seperate thread/process. And this is where I am stuck. I could not find a way that these threads work on the same image or how I can merge the results into a new image. Since I am usually from the java world I usually fighting the problem that I not want to interfere in other Thread. Therefore I just tried to pass each process the image.

def thresholding(img):
    stepSize = int(img.shape[0] / 10)
    futures = []
    with ProcessPoolExecutor(max_workers=4) as e:
        for y in range(0, img.shape[0], stepSize):
            for x in range(0, img.shape[1], stepSize):
                futures.append(e.submit(thresholdThread, y, x, img))
    concurrent.futures.wait(futures)
    return img


def thresholdThread(y, x, img):
    window_size = int(img.shape[0] / 10)
    window_shape = (window_size, window_size)
    window = img[y:y + window_shape[1], x:x + window_shape[0]]
    upper_bound, lower_bound, avg = getThresholdBounds(window, 0.6)

    for y_2 in range(0, window.shape[0]):
        for x_2 in range(0, window.shape[1]):
            tmp = img[y + y_2, x + x_2]
            img[y + y_2, x + x_2] = tmp if (tmp >= upper_bound or tmp <= lower_bound) else avg
    return str(avg)

As far as I understand python this does not work, because each process gets its own copy of img. But since img is of type ndarray of floats from numpy I have no idea if and how I could use shared object as descibed here.

FYI: I am using python 3.6.9. I do know that 3.7 is released, but installing everthing again so I can use spyder and openCV is not so easy.

Are you asking how to put all the pieces back together after they have been processed? — wwii
– wwii, Commented Dec 29, 2019 at 15:33
It woud be a solution I can live with. In my opinion a shared object on which each process would work on, would be a nicer solution, but merging the results in an efficient way is fine with me. — The End
– The End, Commented Dec 29, 2019 at 15:44
How big is the image? Why do you need to process the patches in parallel? Can you include getThresholdBounds? — wwii
– wwii, Commented Dec 29, 2019 at 16:06
The images are ~2MB, but can vary. One example has the size of 4864x3456 pixels. And I want to run this skript inside docker images which have no high CPU capacity, but up to 4 cores. Therefore parallelism would be a good way to speed things up. And getThresholdBounds may differ in my experiments. But basically I take the 0.6 * maxValueOfWindow for the upper bound and vice verser on the lower bound. — The End
– The End, Commented Dec 29, 2019 at 16:24
You are thresholding non-overlapping square patches? The patch shape is based on the first dimension of the array? the bottom and right patches may be smaller than the others? Greyscale? — wwii
– wwii, Commented Dec 29, 2019 at 17:24

wwii · Accepted Answer · 2019-12-31 18:32:33Z

You are not taking advantage of any of Numpy's vectorization techniques which can decrease processing time significantly. I'm assuming this is why you want to multiprocess operations on windows/chunks of the image(s) - I don't know what Docker is so I don't know whether that is a factor in your multiprocess approach.

Here is a vectorized solution with the caveat that it possibly excludes bottom and right edge pixels from the operations. If that is not acceptable no need to read any further.

The size of the right and bottom edge windows in your example are, more likely than not, different than the other windows. Looks like you arbitrarily chose a factor of ten to chunk up your image - if ten was an arbitrary choice, you can optimize the bottom and right edge deltas easily - I'll post that function at the end of the answer.

The image needs to be reshaped into patches to vectorize the operations . I've used an sklearn function sklearn.feature_extraction.image._extract_patches because it is convenient and allows creation of non-overlapping patches (which appears to be what you want). Notice the underscore prefix - this used to be an exposed function, image.extract_patches, but that has been deprecated. The function uses numpy.lib.stride_tricks.as_strided - it might be possible to just reshape the array but I haven't tried that.

Setup

import numpy as np
from sklearn.feature_extraction import image
img = np.arange(4864*3546*3).reshape(4864,3546,3)
# all shape dimensions in the following example derived from img's shape

Define the patch size (see opt_size below) and reshape the image.

hsize, h_remainder, h_windows = opt_size(img.shape[0])
wsize, w_remainder, w_windows = opt_size(img.shape[1])

# rgb - not designed for rgba
if img.ndim == 3:
    patch_shape = (hsize,wsize,img.shape[-1])
else:
    patch_shape = (hsize,wsize)

patches = image._extract_patches(img,patch_shape=patch_shape,
                                 extraction_step=patch_shape)
patches = patches.squeeze()

patches is a view of the original array changes to it will be seen in the original. Its shape is (8, 9, 608, 394, 3) There are 8x9, (608,394,3) windows/patches.

Find the upper and lower bounds of each patch; compare each pixel to the bounds for its patch; extract the indices for each pixel that is between its bounds and needs to be changed.

lower = patches.min((2,3)) * .6
lower = lower[...,None,None,:]
upper = patches.max((2,3)) * .6
upper = upper[...,None,None,:]
indices = np.logical_and(patches > lower, patches < upper).nonzero()

Find the mean of each patch then change the required pixel values,

avg = patches.mean((2,3))    # shape (8,9,3)
patches[indices] = avg[indices[0],indices[1],indices[-1]]

Function that puts it all together

def g(img, opt_shape=False):
    original_shape = img.shape

    # determine patch shape   
    if opt_shape:
        hsize, h_remainder, h_windows = opt_size(img.shape[0])
        wsize, w_remainder, w_windows = opt_size(img.shape[1])
    else:
        patch_size = img.shape[0] // 10
        hsize, wsize = patch_size,patch_size

    # constraint checking here(?) for
    #     number of windows,
    #     orphaned pixels

    if img.ndim == 3:
        patch_shape = (hsize,wsize,img.shape[-1])
    else:
        patch_shape = (hsize,wsize)

    patches = image._extract_patches(img,patch_shape=patch_shape,
                                     extraction_step=patch_shape)
    #squeeze??
    patches = patches.squeeze()

    #assume color (h,w,3)
    lower = patches.min((2,3)) * .6
    lower = lower[...,None,None,:]
    upper = patches.max((2,3)) * .6
    upper = upper[...,None,None,:]
    indices = np.logical_and(patches > lower, patches < upper).nonzero()

    avg = patches.mean((2,3))
##    del lower, upper, mask
    patches[indices] = avg[indices[0],indices[1],indices[-1]]

def opt_size(size):
    '''Maximize number of windows, minimize loss at the edge

    size -> int
       Number of "windows" constrained to 4-10
       Returns (int,int,int)
           size in pixels,
           loss in pixels,
           number of windows
    '''

    size = [(divmod(size,n),n) for n in range(4,11)]
    n_windows = 0
    remainder = 99
    patch_size = 0
    for ((p,r),n) in size:
        if r <= remainder and n > n_windows:
            remainder = r
            n_windows = n
            patch_size = p
    return patch_size, remainder, n_windows

Tested against your naïve process - I hope I executed it correctly. About a 35x improvement on the 4864x3546 color image. There are probably further optimizations maybe some wizards will comment.

Test using your chunk factor of ten:

#yours
def f(img):
    window_size = int(img.shape[0] / 10)
    window_shape = (window_size, window_size)

    for y in range(0, img.shape[0], window_size):
        for x in range(0, img.shape[1], window_size):

            window = img[y:y + window_shape[1], x:x + window_shape[0]]
            upper_bound = window.max((0,1)) * .6
            lower_bound = window.min((0,1)) * .6
            avg = window.mean((0,1))

            for y_2 in range(0, window.shape[0]):
                for x_2 in range(0, window.shape[1]):
                    tmp = img[y + y_2, x + x_2]
                    indices = np.logical_and(tmp < upper_bound,tmp > lower_bound)
                    tmp[indices] = avg[indices]


img0 = np.arange(4864*3546*3).reshape(4864,3546,3)
#get everything the same shape
size = img0.shape[0] // 10
h,w = size*10, size * (img0.shape[1]//size)
img1 = img0[:h,:w].copy()
img2 = img1.copy()

assert np.all(np.logical_and(img1==img2,img2==img0[:h,:w]))
f(img1)    # ~44 seconds
g(img2)    # ~1.2 seconds
assert(np.all(img1==img2))
if not np.all(img2==img0[:h,:w]):
    pass
else:
    raise Exception('did not change')

indices is an index array. It is a tuple of arrays, one for each dimension. indices[0][0],indices[1][0],indices[2][0] would be the index for one element in a 3d array. The complete tuple can be used to index multiple elements of an array.

>>> indices
(array([1, 0, 2]), array([1, 0, 0]), array([1, 1, 1]))
>>> list(zip(*indices))
[(1, 1, 1), (0, 0, 1), (2, 0, 1)]
>>> arr = np.arange(27).reshape(3,3,3)
>>> arr[1,1,1], arr[0,0,1],arr[2,0,2]
(13, 1, 20)
>>> arr[indices]
array([13,  1, 19])

# arr[indices] <-> np.array([arr[1,1,1],arr[0,0,1],arr[2,0,1]])

np.logical_and(patches > lower, patches < upper) returns a boolean array and nonzero() returns the indices of all the elements with a value of True.

I had to adapt your answer a little, since I am actually working on response map of the image. Therefore it is a simple ndarray of floats and not RGB values. But after that it was much faster and makes the parallism unnessary. So thanks you very much. But on the other hand I am still curious if you can work in a image from different processes. Is it really that unusual for python to work on shared memory?
I didn't try it on a 2d array. You probably had to change lower[...,None,None,:], upper[...,None,None,:], and avg[indices[0],indices[1],indices[-1]].?? Probably could automate that instead of hard coding. I'm curious whterh the losses at the edges affect your process.
Yeah, after figuring out what you did there, I changed exactly those lines. I put None at the end and removed indices[-1]. But it would be nice if you could explain the result of np.logical_and(patches > lower, patches < upper).nonzero(). I am not sure about the layout of indices

Collectives™ on Stack Overflow

Multi-threaded image processing with openCV in python

1 Answer 1

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related