2

I have been looking to parallize some tasks in python but did not find anything useful. Here is the pseudo code for which I want to use parallelization:

# Here I define a list for the results. This list has to contain the results in the SAME order.
result_list = []

# Loop over a list of elements. I need to keep that loop. I mean,. in the final code this loop must be still there for specific reasons. Also, the results need to be stored in the SAME order. 
for item in some_list:

    # Here I use a method to process the item of the list. The method "task" is the function I want to parrallize
    result = task(item)

    # Here I append the result to the result list. The results must be in the SAME order as the input data
    result_list.append(result)

I want to parallelize the method task which takes a single item, processes it, and returns some results. I want to collect those results in the same order as in the original list.

The results in the final list result_list has to be in the same order as the items in the input list.

12
  • 2
    your specifications are unclear, as parts of your post contradict each other... You ask Or can I just do something like and provide code that has results = pool.map(process, alldata), but ask how to get the results? . Further down, you mention that you specifically want to use some kind of for loop, so the code you asked about in the middle would not be appicable anyway? Commented Jun 30, 2023 at 6:53
  • Sorry, I am not an expert in using multirpcessing in python! Sure I can make it more clear... Commented Jun 30, 2023 at 6:54
  • And your result has to be in order of input I assume? Commented Jun 30, 2023 at 6:54
  • @FlyingTeller Yes, as I have written two times. I will mention it again to be sure Commented Jun 30, 2023 at 6:56
  • 1
    Does this answer your question? Is there a simple process-based parallel map for python? Commented Jun 30, 2023 at 7:11

2 Answers 2

5

If you do need a for loop, then you could use ThreadPoolExecutor, which let's you submit your jobs to a pool of Threads and collect results afterwards:

from concurrent.futures import ThreadPoolExecutor
result_list=[]
with ThreadPoolExecutor(max_workers=16) as executor:
    futures=[]
    for item in some_list:
        future = executor.submit(task, item)
        futures.append(future)
    for f in futures:
        result_list.append(f.result())

ThreadPoolExecutor is interchangable with ProcessPoolExecutor, which could be a better choicce for CPU intensive tasks. If in doubt, try both and benchmark for your specific usecase. The complete docs can be found here

Note that depending on the number of items, it can be quite inefficient to do append many times. If you know the number of items before, then I suggest that you create the lists in the correct length from the start

If you would arange the input items in a list, you could also do

from multiprocessing import Pool

with Pool() as P:
    result_list= list(P.map(task, some_list))
Sign up to request clarification or add additional context in comments.

1 Comment

Worth mentioning ProcessPoolExecutor as it is interchangeable with ThreadPoolExecutor and may be a better choice if the parallel tasks are CPU-intensive
1

You can implement parallel processing easily with either threads or multi-processing.

There are no "rules" to guarantee which mechanism is going to be best for a particular use-case. However, a good starting consideration is to think about what the sub-task (thread or process) is going to be doing. If it's CPU-intensive then multi-processing is probably the best choice. For I/O bound activity, threads may be better.

Here's a simple pattern that shows how multi-threading and multi-processing can be easily interchanged.

from concurrent.futures import ThreadPoolExecutor, ProcessPoolExecutor

USE_THREADS = False # set according to whether threads or processes are preferred

EXECUTOR = ThreadPoolExecutor if USE_THREADS else ProcessPoolExecutor

def task(n: int) -> int:
    return n * n

def main():
    with EXECUTOR() as executor:
        print(list(executor.map(task, range(10))))

if __name__ == '__main__':
    main()

Output:

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.