Python parallelization for code to combine multiple images

Question

I am new to Python and am trying to parallelize a program that I somehow pieced together from the internet. The program reads all image files (usually multiple series of images such as abc001,abc002...abc015 and xyz001,xyz002....xyz015) in a specific folder and then combines images in a specified range. Most times, the number of files exceeds 10000, and my latest case requires me to combine 24000 images. Could someone help me with:

Taking 2 sets of images from different directories. Currently I have to move these images into 1 directory and then work in said directory.
Reading only specified files. Currently my program reads all files, saves names in an array (I think it's an array. Could be a directory also) and then uses only the images required to combine. If I specify a range of files, it still checks against all files in the directory and takes a lot of time.
Parallel Processing - I work with usually 10k files or sometimes more. These are images saved from the fluid simulations that I run at specific times. Currently, I save about 2k files at a time in separate folders and run the program to combine these 2000 files at one time. And then I copy all the output files to a separate folder to keep them together. It would be great if I could use all 16 cores on the processor to combine all files in 1 go.

Image series 1 is like so. Consider it to be a series of photos of the cat walking towards the camera. Each frame is is suffixed with 001,002,...,n.

Image series 1 is like so. Consider it to be a series of photos of the cat's expression changing with each frame. Each frame is is suffixed with 001,002,...,n.

The code currently combines each frame from set1 and set2 to provide output.png as shown in the link here.

import sys
import os
from PIL import Image

keywords=input('Enter initial characters of image series 1    [Ex:Scalar_ , VoF_Scene_]:\n')
keywords2=input('Enter initial characters of image series 2    [Ex:Scalar_ , VoF_Scene_]:\n')

directory = input('Enter correct folder name where images are present   :\n')  # FOLDER WHERE IMAGES ARE LOCATED

result1 = {}  
result2={}

name_count1=0
name_count2=0
for filename in os.listdir(directory):
    if keywords in filename:
        name_count1 +=1
        result1[name_count1] = os.path.join(directory, filename)
    if keywords2 in filename:
        name_count2 +=1
        result2[name_count2] = os.path.join(directory, filename)

num1=input('Enter initial number of series:\n')
num2=input('Enter final number of series:\n')


num1=int(num1)
num2=int(num2)

if name_count1==(num2-num1+1):
    a1=1
    a2=name_count1
elif name_count2==(num2-num1+1):
    a1=1
    a2=name_count2
else:
    a1=num1
    a2=num2+1

for x in range(a1,a2):
    y=format(x,'05')        # '05' signifies number of digits in the series of file name Ex: [Scalar_scene_1_00345.png --> 5 digits], [Temperature_section_2_951.jpg --> 3 digits]. Change accordingly 
    y=str(y)
    for comparison_name1 in result1:
        for comparison_name2 in result2:
            test1=result1[comparison_name1]
            test2=result2[comparison_name2]
            if y in test1 and y in test2:
                a=test1
                b=test2
                test=[a,b]
                images = [Image.open(x) for x in test]
                widths, heights = zip(*(i.size for i in images))
                total_width = sum(widths)
                max_height = max(heights)

                new_im = Image.new('RGB', (total_width, max_height))

                x_offset = 0
                for im in images:
                    new_im.paste(im, (x_offset,0))
                    x_offset += im.size[0]
                    output_name='output'+y+'.png'
                    new_im.save(os.path.join(directory, output_name))

You are trying to montage 24,000 input images into one big one around 150 images wide and 150 images tall? What are the dimensions of your input images? How does it know how to lay them out? What operating system do you use? You seem to be loading all the images into memory at the same time and then pasting them into the big canvas - that means you demand 2x the memory from your server - once to hold the images and again to hold the big canvas whereas you only need the canvas and one image at a time. — Mark Setchell
– Mark Setchell, Commented Jun 16, 2020 at 14:35
Did you consider making a video rather than an enormously large image? — Mark Setchell
– Mark Setchell, Commented Jun 16, 2020 at 14:36
@MarkSetchell I have two series of images, let's say abc001, abc002... and xyz001, xyz002. I load all these images and then with a loop, I pick abc001 and xyz001 and stack them horizontally and ouptut this as one image. SImilarly for abc002 and xyz003 and so on.I then use these stacked images to make a movie, but I also need these stacked images for presentations. — heavymetalthunder93
– heavymetalthunder93, Commented Jun 16, 2020 at 14:43
Maybe you could show 2 series of 3 images each, with the images and their names, and show what the result might look like. — Mark Setchell
– Mark Setchell, Commented Jun 16, 2020 at 14:47
Please also try to answer the questions in my first comment. Thanks. — Mark Setchell
– Mark Setchell, Commented Jun 16, 2020 at 14:48

Mark Setchell · Accepted Answer · 2020-06-17 08:33:02Z

2

I did a Python version as well, it's not quite as fast but it is maybe closer to your heart :-)

#!/usr/bin/env python3

import cv2
import numpy as np
from multiprocessing import Pool

def doOne(params):
    """Append the two input images side-by-side to output the third."""
    imA = cv2.imread(params[0], cv2.IMREAD_UNCHANGED)
    imB = cv2.imread(params[1], cv2.IMREAD_UNCHANGED)
    res = np.hstack((imA, imB))
    cv2.imwrite(params[2], res) 


if __name__ == '__main__':

    # Build the list of jobs - each entry is a tuple with 2 input filenames and an output filename
    jobList = []
    for i in range(1000):
       # Horizontally append a-XXXXX.png to b-XXXXX.png to make c-XXXXX.png
       jobList.append( (f'a-{i:05d}.png', f'b-{i:05d}.png', f'c-{i:05d}.png') )

    # Make a pool of processes - 1 per CPU core    
    with Pool() as pool:
        # Map the list of jobs to the pool of processes
        pool.map(doOne, jobList)

answered Jun 17, 2020 at 8:33

Mark Setchell

210k32 gold badges309 silver badges503 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

heavymetalthunder93 Over a year ago

Ah this is perfect! I can use this to automate the whole process along with my simulation program! I think the Imagemagick trick has won me over though! It is just one simple line and it is quite easy to share this line with my team and tell them what each parameter is!! They'd appreciate the single line approach rather than the "fancy code"! Big cheers Mark!

Mark Setchell Over a year ago

Cool - glad it works for you. It was fun working on it. Remember questions (and answers) are free on Stack, so come back with a new question if you get stuck. Good luck with your project!

jcupitt · Accepted Answer · 2020-06-18 09:02:47Z

1

You can do this a little quicker with libvips. To join two images left-right, enter:

vips join left.png out.png result.png horizontal

To test, I made 200 pairs of 1200x800 PNGs like this:

for i in {1..200}; do cp x.png left$i.png; cp x.png right$i.png; done

Then tried a benchmark:

time parallel vips join left{}.png right{}.png result{}.png horizontal ::: {1..200}
real    0m42.662s
user    2m35.983s
sys 0m6.446s

With imagemagick on the same laptop I see:

time parallel convert left{}.png right{}.png +append result{}.png ::: {1..200}
real    0m55.088s
user    3m24.556s
sys 0m6.400s

answered Jun 18, 2020 at 9:02

jcupitt

11.3k2 gold badges29 silver badges43 bronze badges

Comments

Mark Setchell · Accepted Answer · 2020-06-16 17:09:03Z

You can do that much faster without Python, and using multi-processing with ImageMagick or libvips.

The first part is all setup:

Make 20 images, called a-000.png ... a-019.png that go from red to blue:

convert -size 64x64 xc:red xc:blue -morph 18 a-%03d.png

Make 20 images, called b-000.png ... b-019.png that go from yellow to magenta:

convert -size 64x64 xc:yellow xc:magenta -morph 18 b-%03d.png

Now append them side-by-side into c-000.png ... c-019.png

for ((f=0;f<20;f++))
do
    z=$(printf "%03d" $f)
    convert a-${z}.png b-${z}.png +append c-${z}.png
done

Those images look like this:

If that looks good, you can do them all in parallel with GNU Parallel:

parallel convert a-{}.png b-{}.png +append c-{}.png ::: {1..19}

Benchmark

I did a quick benchmark and made 20,000 images a-00000.png...a-019999.png and another 20,000 images b-00000.png...b-019999.png with each image 1200x800 pixels. Then I ran the following command to append each pair horizontally and write 20,000 output images c-00000.png...c-019999.png:

seq -f "%05g" 0 19999 | parallel --eta convert a-{}.png b-{}.png +append c-{}.png

and that takes 16 minutes on my MacBook Pro with all 12 CPU cores pegged at 100% throughout. Note that you can:

add spacers between the images,
write annotation onto the images,
add borders,
resize

if you wish and do lots of other processing - this is just a simple example.

Note also that you can get even quicker times - in the region of 10-12 minutes if you accept JPEG instead of PNG as the output format.

Thank you!! I shall try this out. I might need to get ImageMagick! I'll look into it. Cheers :)

Collectives™ on Stack Overflow

Python parallelization for code to combine multiple images

3 Answers 3

2 Comments

Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

2 Comments

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related