multithreaded python program doesn't execute threaded function

Question

Im inexperienced at threading in python and was trying to make a few simple mulithreaded programs to get a bit more experience. I'm trying to send requests out to a pre-defined list of URLs.

When trying to execute the program, it instantly finishes and prints("End") with no exits or exceptions. the print call placed in the thread_function doesn't execute and no errors are thrown.

Any help would be appreciated.

import networking
import threading
import concurrent.futures

class concurrencyTest:
    def __init__(self, URLlist):
        self.URLlist = URLlist
        self.resourceDict = {}


        self._urlListLock = threading.Lock()
        self._resourceListLock = threading.Lock()

    def sendMultiThreadedRequests(self, threadNum=3):
        self.resourceDict = {}
        with concurrent.futures.ThreadPoolExecutor(max_workers=threadNum) as executor:
            results = executor.map(self.thread_function)
        

    
    def thread_function(self):
        print("You are are in the thread_function")
        while True:
            with self._urlListLock:
                numOfRemainingURL = len(self.URLlist)
                print(numOfRemainingURL)
                if numOfRemainingURL == 0:
                    return

                urlToRequest = self.URLlist.pop()

            webpage = networking.getWebpage(urlToRequest)
            ##parse webpage or resource
            
            with self._resourceListLock:
                self.resourceDict[urlToRequest] = webpage
   
    
    def sendRegularRequests(self):
        self.resourceDict = {}
        for url in self.URLlist:
            resource = networking.getWebpage(url)
            self.resourceDict[url] = resource

    def updateURLpool(self):
        return "Not currently coded"


                  
def main():
    #The real urlList is a lot larger than just 3 URLs
    urlList = ["www.google.com","www.stackoverflow.com","www.reddit.com"]

    parTest = concurrencyTest(urlList)

    parTest.sendMultiThreadedRequests()
    
    print("End")

main()

Don't roll your own threading stuff unless you really want the exercise, just use multiprocessing. — AKX
– AKX, Commented Aug 5, 2020 at 16:32
Where does the networking module with getWebpage(..) come from? Can't find any reference to it online. — Don Foumare
– Don Foumare, Commented Aug 5, 2020 at 16:44
Just a custom module I wrote, so you wouldn't find it online — user5423
– user5423, Commented Aug 5, 2020 at 16:51
@user5423 If any of the answers provided was helpfull please accept it, so others know that your question was anwered. — Don Foumare
– Don Foumare, Commented Aug 5, 2020 at 18:07

Don Foumare · Accepted Answer · 2020-08-05 18:18:37Z

executor.map() is for mapping a list of values to a function call and expects an iterable (e.g. a list) as the second argument (or a number of objects as indipendent arguments) to map it's contents to the function provided as the first argument.

For example:

executor.map(self.thread_function, self.URLlist)

or

executor.map(self.thread_function, url1, url2, url3, ..., urln)

will call thread_function(url) for each value in URLlist or each argument provided in the second example.

This in turn means, that your function thread_function() needs to accept an argument in order to get the value from the list: thread_function(self, url). Since the function now gets only one value of the URLlist at a time, the while loop in your function makes no sense anymore and you have to refactor this function to handle only one url instead of a list:

def thread_function(self, url):
    webpage = getWebpage(url)

    # parse webpage or resource
        
    with self._resourceListLock:
        self.resourceDict[url] = webpage

Alternatively, you could use submit() instead of map(), which purpose is to just execute a function asynchronously. This way no modification to the thread_function() is needed:

executor.submit(self.thread_function)

AKX · Accepted Answer · 2020-08-05 16:39:10Z

If you want to use `concurrent.futures`

You never pass any iterables to .map(), so nothing gets done. To simplify the stuff you have (you don't need any of the locks either):

import concurrent.futures
import random
import time
import hashlib


def get_data(url):
    print(f"Starting to get {url}")
    # to pretend doing some work:
    time.sleep(random.uniform(0.5, 1))
    result = hashlib.sha1(url.encode("utf-8")).hexdigest()  
    print(f"OK: {url}")
    return (url, result)


url_list = ["www.google.com", "www.stackoverflow.com", "www.reddit.com"]
with concurrent.futures.ThreadPoolExecutor(max_workers=3) as executor:
    results = {}
    for key, value in executor.map(get_data, url_list):
        results[key] = value
        print(f"Results acquired: {len(results)}")
    # or more simply
    # results = dict(executor.map(get_data, url_list))
    print(results)

prints out (e.g.; it's random)

Starting to get www.google.com
Starting to get www.stackoverflow.com
Starting to get www.reddit.com
OK: www.google.com
Results acquired: 1
OK: www.stackoverflow.com
Results acquired: 2
OK: www.reddit.com
Results acquired: 3
{'www.google.com': 'd8b99f68b208b5453b391cb0c6c3d6a9824f3c3a', 'www.stackoverflow.com': '3954ca3139369180fff4ea3ae984b9a7871b540d', 'www.reddit.com': 'f420470addba27b8577bb40e02229e90af568d69'}

If you want to use `multiprocessing`

(same get_data function as above)

from multiprocessing.pool import ThreadPool, Pool
# (choose between threads or processes)

with ThreadPool(3) as p:
    results = dict(p.imap_unordered(get_data, url_list))
    print(results)

Collectives™ on Stack Overflow

multithreaded python program doesn't execute threaded function

2 Answers 2

Comments

If you want to use `concurrent.futures`

If you want to use `multiprocessing`

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

If you want to use concurrent.futures

If you want to use multiprocessing

Comments

Your Answer

Sign up or log in

Post as a guest

Related

If you want to use `concurrent.futures`

If you want to use `multiprocessing`