2

I'm downloading relatively large files (10mb each) with urllib2, and then loading it as a json file and inserting data into a mysql database, and then repeating the process in an infinite loop. The downloading takes a minute or so, and then it loads everything into mysql. Is there a way to create a thread that does the downloading while the main thread inserts into mysql using python?

my pseudocode:

while 1:
 download file with urllib2
 decode as json file
 extract data I want
 do some computations on data
 insert data into mysql

Thank you so much!

2
  • Twisted. Twisted is always the answer. Commented Jan 19, 2012 at 6:05
  • Twisted (as said by Ignacio) or Tornado, they are both Asynchronous. Commented Jan 19, 2012 at 6:29

1 Answer 1

2

What you could do is use threads and queues. A file IO thread would read and process the file and then insert the result into a queue, where a database IO thread would then discover the result and do the work. Rather than code up an example, I'll direct you here: http://www.ibm.com/developerworks/aix/library/au-threadingpython/

Alternately you could use the python select module to manage multiple file read operations and handle them one by one as they complete: http://docs.python.org/library/select.html

Sign up to request clarification or add additional context in comments.

4 Comments

Maybe multiprocessing (or process pool) colud be safer.
A good point in the case where multiple processors would be useful: a CPU bound operation. In an IO bound situation the extra processors aren't the worry - getting the IO not to block is.
Awesome, just what I was looking for. So I just have to modify the "run" function in the given example with my own code?
Right -- the python thread's start method hands off to the run method where you define the work that the thread will do. So you would likely have one thread class whose run method downloads and processes the files, another thread class whose run method does database inserts. And as the example demonstrates it is easy to set up a thread pool so you could be downloading/inserting multiple files at the same time.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.