1

In python I have a list of files that needs to be uploaded.

An example code might look like this:

        for path, key in upload_list:
            upload_function.upload_file(path, key)

How can I multithread such tasks?

I already came accross multiple solution like process loops, see: How to Multithread functions in Python? but to me these seeme to be kinda overkill just to process a list of files for upload. Is there maybe a smarter way?

3
  • 1
    I am afraid there is no simpler solution, although there are alternatives (ex. coroutines) but it's "complex" as well Commented May 15, 2022 at 12:35
  • 1
    multithreading isn't difficult. Take a look at ThreadPoolExecutor in the concurrent.futures module Commented May 15, 2022 at 12:39
  • Unless the file destinations are all different, this won't really run in parallel, no matter how it's coded. This is because it'll end up by being re-sequentialised by the local network interface or the slowest net segment of the route to the destination. For example, if you had 1000 threads to upload 1000 files, all those upload requests are going to take the same amount of time to progress through the NIC / net anyway, and there will be 1000 mostly blocked threads. Threading improves things if there are different dests per file, and the local NIC / net segment isn't the bottleneck to them. Commented May 16, 2022 at 6:21

1 Answer 1

3

It's not that complicated with ThreadPoolExecutor

def upload_files(upload_list):
    def __upload(val):
        upload_function.upload_file(*val)

    with ThreadPoolExecutor() as executor:
        executor.map(__upload, upload_list)

upload_files(upload_list)

If you can modify upload_function.upload_file to receive the key and path as one parameter the function can be simplified to

upload_list = [...]
with ThreadPoolExecutor() as executor:
    executor.map(upload_function.upload_file, upload_list)

executor.map(__upload, upload_list) will run the __upload function with one sublist from upload_list as parameter, and it will run all of them (almost) simultaneity.

To control the number of threads you can use the max_workers parameter

with ThreadPoolExecutor(max_workers=len(upload_list)) as executor:
Sign up to request clarification or add additional context in comments.

8 Comments

Will it really run in parallel?
@bobah Yes, it will. Why do you think it won't?
Because of the GIL.
@bobah stackoverflow.com/questions/61667111/…. You can also check it easily by printing the values from __upload, you will see unsynchronized and a very messy printout.
Im still kinda confused, please have a look at my whole code, to me its not clear how I can also take the process displayment into this: pastebin.com/nXnj8Hdz I want to display at what point the process currenty is in the list.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.