Python threading - unexpected output

Question

I am new to Python, and have written a threaded script below, which takes each line of a file, and passes it to the get_result function. The get_result function should output the url and status code if it is a 200 or 301.

The code is as follows:

import requests
import Queue
import threading
import re
import time

start_time = int(time.time())
regex_to_use = re.compile(r"^")


def get_result(q, partial_url):
    partial_url = regex_to_use.sub("%s" % "http://www.domain.com/", partial_url)
    r = requests.get(partial_url)
    status = r.status_code
    #result = "nothing"
    if status == 200 or status == 301:
        result = str(status) + " " + partial_url
        print(result)


#need list of urls from file
file_list = [line.strip() for line in open('/home/shares/inbound/seo/feb-404s/list.csv', 'r')]
q = Queue.Queue()
for url in file_list:
    #for each partial. send to the processing function get_result
    t = threading.Thread(target=get_result, args=(q, url))
    t.start()

end_time = int(time.time())
exec_time = end_time - start_time
print("execution time was " + str(exec_time))

I used Queue and threading, but what is happening is that the print of "execution time was x " is being output before the threads finish outputting data.

I.e. typical output is:

200 www.domain.com/ok-url
200 www.domain.com/ok-url-1
200 www.domain.com/ok-url-2
execution time was 3
200 www.domain.com/ok-url-4
200 www.domain.com/ok-ur-5
200 www.domain.com/ok-url-6

How is this happening, and I would like to know how can I have the script execution show at the end of the script, i.e. once all urls have been processed and output?

Thanks to the answer given below by utdemir, here's the updated code with join.

import requests
import Queue
import threading
import re
import time

start_time = int(time.time())
regex_to_use = re.compile(r"^")


def get_result(q, partial_url):
    partial_url = regex_to_use.sub("%s" % "http://www.domain.com/", partial_url)
    r = requests.get(partial_url)
    status = r.status_code
    #result = "nothing"
    if status == 200 or status == 301:
        result = str(status) + " " + partial_url
        print(result)


#need list of urls from file
file_list = [line.strip() for line in open('/home/shares/inbound/seo/feb-404s/list.csv', 'r')]
q = Queue.Queue()
threads_list = []

for url in file_list:
    #for each partial. send to the processing function get_result
    t = threading.Thread(target=get_result, args=(q, url))
    threads_list.append(t)
    t.start()

for thread in threads_list:
    thread.join()


end_time = int(time.time())
exec_time = end_time - start_time
print("execution time was " + str(exec_time))

You start threads and continue with execution. You DO NOT wait from them to finish, so you do print "execution time was X" before they (at least some of them) has finished. To wait for thread to finish use thread.join() — Milosz Krajewski
– Milosz Krajewski, Commented Feb 11, 2014 at 11:48

utdemir · Accepted Answer · 2014-02-11 11:42:35Z

3

You should join threads to wait for them, or they will continue executing in background.

Like this:

threads = []
for url in file_list:
    ...
    threads.append(t)

for thread in threads:
    thread.join() # Wait until each thread terminates

end_time = int(time.time()
...

answered Feb 11, 2014 at 11:42

utdemir

27.3k11 gold badges65 silver badges82 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

pokero Over a year ago

Thanks very much, that works beautifully. I'll post the new code up above, and mark this as answer.

Collectives™ on Stack Overflow

Python threading - unexpected output

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related