Asynchronous file downloads in Python

Question

I'm trying to find a way to download multiple files asynchronously in Python(2.6) preferably via Requests Module. Gevent and Twisted will also be acceptable as I'll be learning them in the near future.

My application requires the download of 40+ files in a short period of time, I want to continuously download all the files 4 at a time. And every-time one file download completes another one is started so it stays at 4. Is this possible?

Anurag Uniyal · Accepted Answer · 2013-09-19 18:20:20Z

13

You don't need to use any external library or framework for such a simple task, put the list of urls in a queue, start 4 threads and each thread should take an item from queue and download it.

something like this:

import sys
import os
import urllib
import threading
from Queue import Queue

class DownloadThread(threading.Thread):
    def __init__(self, queue, destfolder):
        super(DownloadThread, self).__init__()
        self.queue = queue
        self.destfolder = destfolder
        self.daemon = True

    def run(self):
        while True:
            url = self.queue.get()
            try:
                self.download_url(url)
            except Exception,e:
                print "   Error: %s"%e
            self.queue.task_done()

    def download_url(self, url):
        # change it to a different way if you require
        name = url.split('/')[-1]
        dest = os.path.join(self.destfolder, name)
        print "[%s] Downloading %s -> %s"%(self.ident, url, dest)
        urllib.urlretrieve(url, dest)

def download(urls, destfolder, numthreads=4):
    queue = Queue()
    for url in urls:
        queue.put(url)

    for i in range(numthreads):
        t = DownloadThread(queue, destfolder)
        t.start()

    queue.join()

if __name__ == "__main__":
    download(sys.argv[1:], "/tmp")

usage:

$ python download.py http://en.wikipedia.org/wiki/1 http://en.wikipedia.org/wiki/2 http://en.wikipedia.org/wiki/3 http://en.wikipedia.org/wiki/4
[4456497152] Downloading http://en.wikipedia.org/wiki/1 -> /tmp/1
[4457033728] Downloading http://en.wikipedia.org/wiki/2 -> /tmp/2
[4457701376] Downloading http://en.wikipedia.org/wiki/3 -> /tmp/3
[4458258432] Downloading http://en.wikipedia.org/wiki/4 -> /tmp/4

edited Sep 19, 2013 at 18:20

answered Sep 18, 2013 at 23:26

Anurag Uniyal

89.2k41 gold badges181 silver badges223 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

user2963977 Over a year ago

But that's not asynchronous, right? We're blocking thread until file is downloaded.

Anurag Uniyal Over a year ago

@user2963977 downloading in separate thread in asynchronous with main thread, in the example we had nothing to do, so we waited but you can have done something else e.g. showing stats to user or play tic-tac-toe with him until file downloads

0xc0de Over a year ago

Wait, what? with GIL, the program is not going do anything else right?

Anurag Uniyal Over a year ago

@0xc0de GIL doesn't mean that multi threading doesn't work, GIL only affects computation not blocking calls on IO

Collectives™ on Stack Overflow

Asynchronous file downloads in Python

1 Answer 1

4 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

4 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related