4

I've sourced a slowness in my application to the execute() function of mysql. I crafted a simple sql query that exemplifies this problem:

SELECT * FROM `cid444_agg_big` c WHERE 1

.

>>> import MySQLdb as mdb
>>> import time;
>>> 
>>> dbconn =  mdb.connect('localhost','*****','*****','*****');
>>> cursorconn = dbconn.cursor()
>>> 
>>> sql="SELECT * FROM `cid444_agg_big` c WHERE 1";
>>> 
>>> startstart=time.time();
>>> cursorconn.execute(sql);
21600L #returned 21600 records
>>> print time.time()-startstart, "for execute()"
2.86254501343 for execute() #why does this take so long?
>>> 
>>> startstart=time.time();
>>> rows = cursorconn.fetchall()
>>> print time.time()-startstart, "for fetchall()"
0.0021288394928 for fetchall() #this is very fast, no problem with fetchall()

Running this query in the mysql shell, yields 0.27 seconds, or 10 times faster!!!

My only thought is the size of the data being returned. This returns 21600 "wide" rows. So that's a lot of data being sent to python. The database is localhost, so there's no network latency.

Why does this take so long?

UPDATE MORE INFORMATION

I wrote a similar script in php:

$c = mysql_connect ( 'localhost', '*****', '****', true );
mysql_select_db ( 'cachedata', $c );

$time_start = microtime_float();
$sql="SELECT * FROM `cid444_agg_big` c WHERE 1";
$q=mysql_query($sql);$c=0;
while($r=mysql_fetch_array($q))
$c++;//do something?
echo "Did ".$c." loops in ".(microtime_float() - $time_start)." seconds\n";


function microtime_float(){//function taken from php.net
  list($usec, $sec) = explode(" ", microtime());
  return ((float)$usec + (float)$sec);
}

This prints:

Did 21600 loops in 0.56120800971985 seconds

This loops on all the data instead of retrieving it all at once. PHP appears to be 6 times faster than the python version ....

2 Answers 2

2

The default MySQLdb cursor fetches the complete result set to the client on execute, and fetchall() will just copy the data from memory to memory.

If you want to store the result set on the server and fetch it on demand, you should use SSCursor instead.

Cursor:
This is the standard Cursor class that returns rows as tuples and stores the result set in the client.

SSCursor:
This is a Cursor class that returns rows as tuples and stores the result set in the server.

Sign up to request clarification or add additional context in comments.

3 Comments

I'm looping on the data following this and using all of it. Am I correct in assuming that running execute() in my situation and fetching it all would be roughly equivalent to your method with SSCursor and fetching records on demand presuming I need all of it? Would SSCursor provide a substantial gain in terms of time?
@Landon If you're fetching all data anyway, I would assume that Cursor is faster, since it can transfer all data without regard to any access pattern. SSCursor is mainly better for queries that don't necessarily use all data, or for large result sets to save memory.
It's worth noting that using SSCursor, you can't open a new connection to the database, or execute a new query until closing the present cursor.
0

Very old discussion but i try to add my 2 cent. Script had to select within many rows by timestamp. In standard situation (id index, name, timestamp) was very very slow (i didnt check but minutes, lot of minutes). I added an index to timestamp too.. the query took under 10 seconds. Much better.

"ALTER TABLE BTC ADD INDEX(timestamp)"

i hope can help.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.