0

Im inexperienced at threading in python and was trying to make a few simple mulithreaded programs to get a bit more experience. I'm trying to send requests out to a pre-defined list of URLs.

When trying to execute the program, it instantly finishes and prints("End") with no exits or exceptions. the print call placed in the thread_function doesn't execute and no errors are thrown.

Any help would be appreciated.

import networking
import threading
import concurrent.futures

class concurrencyTest:
    def __init__(self, URLlist):
        self.URLlist = URLlist
        self.resourceDict = {}


        self._urlListLock = threading.Lock()
        self._resourceListLock = threading.Lock()

    def sendMultiThreadedRequests(self, threadNum=3):
        self.resourceDict = {}
        with concurrent.futures.ThreadPoolExecutor(max_workers=threadNum) as executor:
            results = executor.map(self.thread_function)
        

    
    def thread_function(self):
        print("You are are in the thread_function")
        while True:
            with self._urlListLock:
                numOfRemainingURL = len(self.URLlist)
                print(numOfRemainingURL)
                if numOfRemainingURL == 0:
                    return

                urlToRequest = self.URLlist.pop()

            webpage = networking.getWebpage(urlToRequest)
            ##parse webpage or resource
            
            with self._resourceListLock:
                self.resourceDict[urlToRequest] = webpage
   
    
    def sendRegularRequests(self):
        self.resourceDict = {}
        for url in self.URLlist:
            resource = networking.getWebpage(url)
            self.resourceDict[url] = resource

    def updateURLpool(self):
        return "Not currently coded"


                  
def main():
    #The real urlList is a lot larger than just 3 URLs
    urlList = ["www.google.com","www.stackoverflow.com","www.reddit.com"]

    parTest = concurrencyTest(urlList)

    parTest.sendMultiThreadedRequests()
    
    print("End")

main()

4
  • Don't roll your own threading stuff unless you really want the exercise, just use multiprocessing. Commented Aug 5, 2020 at 16:32
  • Where does the networking module with getWebpage(..) come from? Can't find any reference to it online. Commented Aug 5, 2020 at 16:44
  • Just a custom module I wrote, so you wouldn't find it online Commented Aug 5, 2020 at 16:51
  • @user5423 If any of the answers provided was helpfull please accept it, so others know that your question was anwered. Commented Aug 5, 2020 at 18:07

2 Answers 2

1

executor.map() is for mapping a list of values to a function call and expects an iterable (e.g. a list) as the second argument (or a number of objects as indipendent arguments) to map it's contents to the function provided as the first argument.

For example:

executor.map(self.thread_function, self.URLlist)

or

executor.map(self.thread_function, url1, url2, url3, ..., urln)

will call thread_function(url) for each value in URLlist or each argument provided in the second example.

This in turn means, that your function thread_function() needs to accept an argument in order to get the value from the list: thread_function(self, url). Since the function now gets only one value of the URLlist at a time, the while loop in your function makes no sense anymore and you have to refactor this function to handle only one url instead of a list:

def thread_function(self, url):
    webpage = getWebpage(url)

    # parse webpage or resource
        
    with self._resourceListLock:
        self.resourceDict[url] = webpage

Alternatively, you could use submit() instead of map(), which purpose is to just execute a function asynchronously. This way no modification to the thread_function() is needed:

executor.submit(self.thread_function)
Sign up to request clarification or add additional context in comments.

Comments

1

If you want to use concurrent.futures

You never pass any iterables to .map(), so nothing gets done. To simplify the stuff you have (you don't need any of the locks either):

import concurrent.futures
import random
import time
import hashlib


def get_data(url):
    print(f"Starting to get {url}")
    # to pretend doing some work:
    time.sleep(random.uniform(0.5, 1))
    result = hashlib.sha1(url.encode("utf-8")).hexdigest()  
    print(f"OK: {url}")
    return (url, result)


url_list = ["www.google.com", "www.stackoverflow.com", "www.reddit.com"]
with concurrent.futures.ThreadPoolExecutor(max_workers=3) as executor:
    results = {}
    for key, value in executor.map(get_data, url_list):
        results[key] = value
        print(f"Results acquired: {len(results)}")
    # or more simply
    # results = dict(executor.map(get_data, url_list))
    print(results)

prints out (e.g.; it's random)

Starting to get www.google.com
Starting to get www.stackoverflow.com
Starting to get www.reddit.com
OK: www.google.com
Results acquired: 1
OK: www.stackoverflow.com
Results acquired: 2
OK: www.reddit.com
Results acquired: 3
{'www.google.com': 'd8b99f68b208b5453b391cb0c6c3d6a9824f3c3a', 'www.stackoverflow.com': '3954ca3139369180fff4ea3ae984b9a7871b540d', 'www.reddit.com': 'f420470addba27b8577bb40e02229e90af568d69'}

If you want to use multiprocessing

(same get_data function as above)

from multiprocessing.pool import ThreadPool, Pool
# (choose between threads or processes)

with ThreadPool(3) as p:
    results = dict(p.imap_unordered(get_data, url_list))
    print(results)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.