15

MySQL Version: 5.5.37-0ubuntu0.14.04.1

I am currently writing a python script which makes use of numerous MySQL tables and queries to obtain results from an inverted index stored in tables.

I have noticed that choosing the right type of Cursor in the MySQLdb python module when executing a query has a really big effect on performance and was wondering if someone could explain or provide a reliable resource explaining which cursor to use when.

As an example, executing this query 40 times with SSCursor takes 7 seconds:

SELECT Pages.PageID,
       Pages.PageName,
       Counter AS TermFreq,
       Pages.Length,
       (Counter / LOG(Length)) AS Weight
FROM Pages
INNER JOIN TermOccurrences ON TermOccurrences.PageID = Pages.PageID
INNER JOIN Terms ON TermOccurrences.TermID = Terms.TermID
WHERE TermName = %s
ORDER BY Weight DESC
LIMIT 20;

Running the same query 40 times with the default Cursor takes 0.004 seconds.

Removing the calculation of weight (Counter/LOG(Length)) makes this query execute fast again using an SSCursor however.

I was using SSCursor because it was proving to have vastly superior performance on a number of other queries and then suddenly became very slow for this one. Changing back to the default Cursor surprised me when it executed so fast.

EDIT: Some more examples.

Running the following with the default cursor 40 times takes ~3 seconds:

SELECT COUNT(*)
FROM Pages
INNER JOIN TermOccurrences ON TermOccurrences.PageID = Pages.PageID
INNER JOIN Terms ON TermOccurrences.TermID = Terms.TermID
WHERE TermName = %s AND Counter > 2

Running it with SSCursor instead takes about 0.002 seconds.

8
  • 3
    That's an excellent question, actually. Glancing at Python-MySQL's source, I couldn't find anything that'd make this that much slower... All I can think of is some sort of communication overhead or a MySQL bug that causes it to maybe recalculate the result set for each row fetched. Commented Jul 17, 2014 at 9:07
  • 1
    How did you measure performance differences? Commented Jul 24, 2014 at 11:41
  • Im using a simple timer (time module in python) Commented Jul 24, 2014 at 13:08
  • Can you check, please, whether it takes the same time for each one of the 40 iterations, or whether there is some startup overhead. Commented Jul 24, 2014 at 20:40
  • If you are referring to the role of the Key Cache, it does play some part, there is still noticeable difference between the different uses of cursors across different queries. Is there some authoritative resource about the differences between the different types of cursors available? Commented Jul 25, 2014 at 7:05

1 Answer 1

9
+50

The MySQLdb docs mention that the standard Cursor class is using mysql_store_result() whereas SSCursor is using mysql_use_result() "with the latter you must ensure all the rows have been read before another query can be executed".

So it's about mysql_store_result() vs mysql_use_result().

The MySQL docs mention that mysql_use_result() initiates a result set retrieval without actually reading the result set into the client, like mysql_store_result() does. So each row has to be retrieved individually with calls to mysql_fetch_row(), which of course can add up considerably when dealing with large tables.

Also in the MySQLdb docs:

SSCursor: A "server-side" cursor. Like Cursor but uses CursorUseResultMixIn. Use only if you are dealing with potentially large result sets.

So SSCursor is mainly good if your result set is too large to move into your client all at once.

See also these questions:

And note that a LIMIT 20 query can never really be that large. You might have to check your KEYs. To get a better idea about why that could take 7 seconds it's probably best to also include the db schema in a question, probably more something for DBA stack.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.