Mysql Date Range Query Loop in Python

Question

I'm loop querying a few date ranges:

con = MySQLdb.connect(host='xxx.com', port=3306,user='user', passwd='pass', db='db')

intervals = 10
precision = 1

dateChunks = list()
for i in range(0,intervals,precision):
    results = load_data_window(i,precision)
    dateChunks.append(results)

def load_data_window(start,precision):
    length = start + precision # <-- time period 
    limit = 20000 
    cur = con.cursor()
    sql = "SELECT data FROM table WHERE date < DATE_SUB(NOW(), INTERVAL %s HOUR) AND date > DATE_SUB(NOW(), INTERVAL %s HOUR)"
    cur.execute(sql,(start,length))
    results =  cur.fetchall()

It works lightning fast the first few loops, sometimes even all of them, but bogs significantly from time to time. There are other actions happening on the database from different places... Is there something I can do to ensure my query has priority? Something like a transaction?

######### EDIT

I did notice, if I move the con = MSQLdb... inside of the load_data_window function, I get really fast results and then bogging, whereas if I keep the con = MSQLdb... outside it is consistently slower...

O. Jones · Accepted Answer · 2014-09-06 17:52:41Z

1

Does your date column have an index on it? Such an index may help performance a great deal. That's especially true if your queries progressively slow down: MySQL may be scanning through your table, further and further for each successive hour, looking for the first row to return.

It looks like you're fetching an hour's worth of data with each request, and running fetchall() to slurp all that data into RAM in your python program at once. Is that what you want? If you have an hour with lots of results in it you may hammer the RAM in your python program, forcing garbage collection and even operating-system thrashing.

Can you, instead, cause your python program to iterate over the rows in the result set? This will most likely take a lot less RAM. You can do something like this: http://dev.mysql.com/doc/connector-python/en/connector-python-api-mysqlcursor-fetchone.html

cur.execute(sql,(start,length))
for row in cur:
    print(row)

This sort of strategy would also allow you to process your data efficiently in chunks bigger than an hour if that would help your application. In many, but not all, cases fewer queries means better queries.

Do some of your hours have hugely more rows in them? This could be causing the unpredictable performance.

Finally, you have an off-by-one bug in your query. It should say this...

 WHERE ...  AND date >= DATE_SUB(NOW(), INTERVAL %s HOUR)

(Notice the >= replacing your >)

answered Sep 6, 2014 at 17:52

O. Jones

109k17 gold badges134 silver badges187 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Howard Zoopaloopa Over a year ago

Ollie, thank you for your input. I'm giving you a point for all the great feedback and suggestions. I'm leaving the question open just in case someone has some more insights. It's a pretty open ended question, so thank you for your time.

Collectives™ on Stack Overflow

Mysql Date Range Query Loop in Python

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related