0

I am applying Multi-threading to a python script to improve its performance. I don't understand why there is no improvement in the execution time.

This is the code snippet of my implementation:

from queue import Queue
from threading import Thread
from datetime import datetime
import time



class WP_TITLE_DOWNLOADER(Thread):
    def __init__(self, queue,name):
        Thread.__init__(self)
        self.queue = queue
        self.name = name
 
    
    def download_link(self,linkss):       
       ####some test function
       ###later some processing will be done on this list.
       #####this will be processed on CPU. 
       for idx,link in enumerate(linkss):
           ##time.sleep(0.01)
           test.append(idx)

       for idx,i in enumerate(testv):
           i=i.append(2)
      ##

    def run(self):
        while True:
            # Get the work from the queue
            linkss = self.queue.get()
            try:
                 self.download_link(linkss)
            finally:
                 self.queue.task_done()                


       
######with threading

testv=[[i for i in range(5000)] for j in range(20)]
links_list=[[i for i in range(100000)] for j in range(20)]
test=[]
start_time =time.time()
queue = Queue()
thread_count=8
for x in range(thread_count):
    worker = WP_TITLE_DOWNLOADER(queue,str(x))
    # Setting daemon to True will let the main thread exit even though the workers are blocking
    worker.daemon = True
    worker.start()




##FILL UP Queue for threads
for links in links_list: 
        queue.put(links)
        
        
        
##print("queing time={}".format(time.time()-start_time))        
#print(test)
#wait for all to end
j_time =time.time()
queue.join()
t_time = time.time()-start_time
print("With threading time={}".format(t_time))
           
    



#############without threading,  
###following function is same as the one in threading. 
test=[]
def download_link(links1):       
        for idx,link in enumerate(links1):
           ##time.sleep(0.01)
           test.append(idx)
           
        for idx,i in enumerate(testv):
           i=i.append(2)



start_time =time.time()
for links in links_list: 
        download_link(links)
       
        
t_time = time.time()-start_time
print("without threading time={}".format(t_time))

With threading time=0.564049482345581 without threading time=0.13332700729370117

NOTE: When I uncomment time.sleep, with threading time is lower than without threading. My test case is I have a list of lists, each list has more than 10000s elements, the idea of using multi-threading is that instead of processing a single list item, multiple lists can be processed simultaneously, resulting in a decrease in overall time. But the results are not as expected.

3
  • Why would you use a queue when multithreading? You should run a thread to process a "link" then terminate. What you're doing here is an anti-pattern IMO Commented May 12, 2022 at 7:54
  • I think this is how multi-threading is implemented in queue. tutorialspoint.com/python/python_multithreading.htm Commented May 12, 2022 at 9:34
  • I need to process all elements in list and hence putting in queue. Commented May 12, 2022 at 9:35

3 Answers 3

3

As a general rule (there will always be exceptions) multithreading is best suited to IO-bound processing (this includes networking). Multiprocessing is well suited to CPU-intensive activities.

Your testing is therefore flawed.

Your intention is clearly to do some kind of web-crawling but that's not happening in your test code which means that your test is CPU-intensive and therefore not suitable for multi-threading. Whereas, once you've added your networking code you may find that matters have improved providing you've used suitable techniques.

Take a look at ThreadPoolExecutor in concurrent.futures. You may find that useful in particular because you can swap to multiprocessing by simply replacing ThreadPoolExecutor with ProcessPoolExecutor which will make your experiments easier to quantify

Sign up to request clarification or add additional context in comments.

Comments

2

Python has a concept called 'GIL(Global Interpreter Lock)'. This lock ensures that only one thread looks during runtime. Therefore, even if you spawned multiple threads to process multiple lists, only one thread is processing at a time. You can try multi-processing for parallel execution.

2 Comments

But when I run it with sleep, it is giving expected results. Is it something to do with the TEST list?
It should give the same time when I am using Sleep function, but it is working as expected. Multiple-threading is implemented to process the list simaltaneously. What's the point of Multi-threading, if each thread is processed one by one?
1

Threading is awkward in Python because of the GIL (Global Interpreter Lock). Threads have to compete to get the main interpreter to be able to compute. Threading in python is only beneficial when the code inside the thread does not require the global interpreter, ie. when offloading computations to a hardware accelerator, when doing I/O bound computations or when calling a non-python library. For true concurrency in python, use multiprocessing instead. It's a bit more cumbersome as you have to specifically share your variables or duplicate them and often serialize your communications.

2 Comments

Why does it run fine when Sleep is implemented? Maybe it only works for CPU extensive tasks?
Sleep releases the GIL and lets another thread use the interpreter.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.