Implementation of multithreading using concurrent.future in Python

Question

I have written a python code which convert raw data (STM Microscope) into png format and it run perfectly on my Macbook Pro.

Below is the simplified Python Code:

for root, dirs, file in os.walk(path):
    for dir in dirs:
        fpath = path +'/'+ dir
        os.chdir(fpath)
        spaths=savepath +'/'+ dir
        if os.path.exists(spaths) ==False:
           os.mkdir(spaths)

         for files in glob.glob("*.sm4"):
             for file in files:     
                 data_conv (files, file, spaths)

But it does take 30 - 40 mins for100 files.

Now, I wanted to reduce processing time using multithreading technique (using “concurrent future” library). Was trying to modify python code using YouTube video on “Python Threading Tutorial” as an example.

But I have to pass too many arguments such as “root”, “dirs.”, “file” in the executor.map() method. I don’t know how to resolve this further.

Below this the simplified multithreading Python code

def raw_data (root, dirs, file):
    for dir in dirs:
        fpath = path +'/'+ dir
        os.chdir(fpath)
        spaths=savepath +'/'+ dir
        if os.path.exists(spaths)==False:
            os.mkdir(spaths)

        for files in glob.glob("*.sm4"):
            for file in files:
                data_conv(files, file, spaths)

with concurrent.futures.ThreadPoolExecutor() as executor:
     executor.map(raw_data, root, dirs, file)

NameError: name 'root' is not defined

Any suggestion is appreciated, Thank You.

If the workload is CPU bound you should use concurrent.futures.ProcessPoolExecutor instead since Python threads will not run concurrently due to the GIL. Do you need to wrap your call to executor.map with for root, dirs, file in os.walk(path):? — Iain Shelvington
– Iain Shelvington, Commented Aug 25, 2021 at 14:18
Sorry I am not an expert here, I don't know what is GIL. But, I need to reduce the processing time using by Multithreading or multiprocessing. ............. {Do you need to wrap your call to executor.map with for root, dirs, file in os.walk(path):?} YES — user13058902
– user13058902, Commented Aug 25, 2021 at 14:28
Unless you are IO bound (lots of network/API calls, writing/reading files) multiprocessing is your best bet. The GIL prevents threads from running concurrently (at the same time) — Iain Shelvington
– Iain Shelvington, Commented Aug 25, 2021 at 14:29
Any example or suggestion would be helpful to understand to implement the code. — user13058902
– user13058902, Commented Aug 25, 2021 at 14:33

user13058902 · Accepted Answer · 2021-08-27 15:30:02Z

1

Thanks for the advice Iain Shelvington & Thenoneman.

Pathlib does reduces the clutter I was having in my code.

"ProcessPoolExecutor" worked in my CPU intense function.

  with concurrent.futures.ProcessPoolExecutor() as executor:
        executor.map(raw_data, os.walk(path))

answered Aug 27, 2021 at 15:30

user13058902

Sign up to request clarification or add additional context in comments.

Comments

TheNoneMan · Accepted Answer · 2021-08-31 14:47:40Z

0

First of all, as Iain Shelvington pointed out, data_conv seems like a CPU intensive function, therefore you won't notice improvement with ThreadPoolExecutor, use ProcessPoolExecutor. Second, you have to pass parameters to each instance of function call, i.e. pass lists of arguments to raw_data. Assuming root and file are the same and dirs is a list:

with concurrent.futures.ProcessPoolExecutor() as executor:
    results = executor.map(raw_data, [root]*len(dirs), dirs, [file]*len(dirs)
    for result in results:
        # Collect you results

As a sidenote, you may find working with filesystem more pleasing with pathlib, which is also built-in since Python 3.4

edited Aug 31, 2021 at 14:47

answered Aug 25, 2021 at 14:55

TheNoneMan

313 bronze badges

Collectives™ on Stack Overflow

Implementation of multithreading using concurrent.future in Python

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related