Simple multithreaded loop in Python

Question

I have a functioning loop that runs a series of sequential tests on each of a list of elements. Unfortunately the list is growing quickly, as are the number of tests, and I'd like to see if I can add threading to it in such a way that it can perform the tests on several different elements simultaneously. The tests themselves must be run sequentially as each may rely on previous data returned by the tests to perform the next one, but each element is independent and doesn't require data from another element. Only after each element is complete would something need to be done on the entire dataset.

def do_some_tests_on_a_list_of_elements(element_list):
    do_some_stuff_here_to_set_up()

    for index, element in enumerate(element_list, start=1):
        element = do_some_stuff_on_element(index, element) 

    do_some_stuff_after_each_element_has_finished()

For example in this code, I'd want to do the setup, then allow the loop to process more than one element at a time via threading, and then after all elements are finished, do the final step on the dataset. What would be the simplest mechanism to achieve this?

It sounds like you want to process items in parallel on different CPUs. As I understand the terminology, this is different than running multiple threads on a single CPU. My suggestion is to search for the terms 'parallel processing', 'Python', and probably 'example' or 'tutorial'. I have done this some time ago — James Phillips
– James Phillips, Commented Apr 18, 2018 at 19:48

nosklo · Accepted Answer · 2018-04-18 19:48:34Z

3

Like all questions with threading, to answer accurately, it is very important to know what exactly the tests do, as in, if they are using the CPU to compute something, or if they are using the network.

Python's most common implementation (CPython) has a global interpreter lock (GIL) that prevents threadings from running python code in parallel. So if your tests are CPU-based, they can't run at the same time using threads. You have to use processes to do it.

If your tests on the other hand are IO-based (waiting for network packets) then you could use threads, but it is better to just use asynchronous requests insteads. You don't need extra threads to wait for network, you can just not wait.

answered Apr 18, 2018 at 19:48

nosklo

224k58 gold badges300 silver badges299 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Chris Over a year ago

Sounds like it would be parallel processing then as opposed to multithreading as the tests are not merely waiting on network IO. Thanks for the clarification.

Collectives™ on Stack Overflow

Simple multithreaded loop in Python

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related