4

I'm having a heckuva time dealing with slow MySQL queries in Python. In one area of my application, "load data infile" goes quick. In an another area, the select queries are VERY slow.

Executing the same query in PhpMyAdmin AND Navicat (as a second test) yields a response ~5x faster than in Python.

A few notes...

  • I switched to MySQLdb as the connector and am also using SSCursor. No performance increase.
  • The database is optimized, indexed etc. I'm porting this application to Python from PHP/Codeigniter where it ran fine (I foolishly thought getting out of PHP would help speed it up)
  • PHP/Codeigniter executes the select queries swiftly. For example, one key aspect of the application takes ~2 seconds in PHP/Codeigniter, but is taking 10 seconds in Python BEFORE any of the analysis of the data is done.

My link to the database is fairly standard...

dbconn=MySQLdb.connect(host="127.0.0.1",user="*",passwd="*",db="*", cursorclass = MySQLdb.cursors.SSCursor)

Any insights/help/advice would be greatly appreciated!

UPDATE

In terms of fetching/handling the results, I've tried it a few ways. The initial query is fairly standard...

# Run Query
cursor.execute(query)

I removed all of the code within this loop just to make sure it wasn't the case bottlekneck, and it's not. I put dummy code in its place. The entire process did not speed up at all.

db_results = "test"

# Loop Results
for row in cursor:

    a = 0 (this was the dummy code I put in to test)

return db_results

The query result itself is only 501 rows (large amount of columns)... took 0.029 seconds outside of Python. Taking significantly longer than that within Python.

The project is related to horse racing. The query is done within this function. The query itself is long, however, it runs well outside of Python. I commented out the code within the loop on purpose for testing... also the print(query) in hopes of figuring this out.

# Get PPs
def get_pps(race_ids):

# Comma Race List
race_list = ','.join(map(str, race_ids))

# PPs Query
query = ("SELECT raceindex.race_id, entries.entry_id, entries.prognum, runlines.line_id, runlines.track_code, runlines.race_date, runlines.race_number, runlines.horse_name, runlines.line_date, runlines.line_track, runlines.line_race, runlines.surface, runlines.distance, runlines.starters, runlines.race_grade, runlines.post_position, runlines.c1pos, runlines.c1posn, runlines.c1len, runlines.c2pos, runlines.c2posn, runlines.c2len, runlines.c3pos, runlines.c3posn, runlines.c3len, runlines.c4pos, runlines.c4posn, runlines.c4len, runlines.c5pos, runlines.c5posn, runlines.c5len, runlines.finpos, runlines.finposn, runlines.finlen, runlines.dq, runlines.dh, runlines.dqplace, runlines.beyer, runlines.weight, runlines.comment, runlines.long_comment, runlines.odds, runlines.odds_position, runlines.entries, runlines.track_variant, runlines.speed_rating, runlines.sealed_track, runlines.frac1, runlines.frac2, runlines.frac3, runlines.frac4, runlines.frac5, runlines.frac6, runlines.final_time, charts.raceshape "
         "FROM hrdb_raceindex raceindex "
         "INNER JOIN hrdb_runlines runlines ON runlines.race_date = raceindex.race_date AND runlines.track_code = raceindex.track_code AND runlines.race_number = raceindex.race_number "
         "INNER JOIN hrdb_entries entries ON entries.race_date=runlines.race_date AND entries.track_code=runlines.track_code AND  entries.race_number=runlines.race_number AND entries.horse_name=runlines.horse_name "
         "LEFT JOIN hrdb_charts charts ON runlines.line_date = charts.race_date AND runlines.line_track = charts.track_code AND runlines.line_race = charts.race_number "
         "WHERE raceindex.race_id IN (" + race_list  + ") "
         "ORDER BY runlines.line_date DESC;")

print(query)

# Run Query
cursor.execute(query)

# Query Fields
fields = [i[0] for i in cursor.description]

# PPs List
pps = []

# Loop Results
for row in cursor:

    a = 0
    #this_pp = {}

    #for i, value in enumerate(row):
    #    this_pp[fields[i]] = value            

    #pps.append(this_pp)

return pps

One final note... I haven't considered the ideal way to handle the result. I believe one cursor allows the result to come back as a set of dictionaries. I haven't even made it to that point yet as the query and return itself is so slow.

9
  • 1
    often this is not a sql issue, but an issue with how you fetch and handle the result. a few lines of code would help the pythonites track this down Commented Jul 17, 2013 at 13:04
  • Could switch back to the normal cursor and recheck the query execution speed? Commented Jul 17, 2013 at 13:07
  • The normal cursor, I thought, was the reason at first. Both are producing similarly slow results. Commented Jul 17, 2013 at 13:11
  • MySQL have two providers for Python. Try other one. Commented Jul 17, 2013 at 13:28
  • Without more of the surrounding Python code it's impossible to give any real input here. It seems likely you've got a simple bit of not quite rightness, but without being able to see it, nobody can say. Commented Jul 17, 2013 at 13:29

2 Answers 2

2

Tho you have only 501 rows it looks like you have over 50 columns. How much total data is being passed from MySQL to Python?

501 rows x 55 columns = 27,555 cells returned.

If each cell averaged "only" 1K that would be close to 27MB of data returned.

To get a sense of how much data mysql is pushing you can add this to your query:

SHOW SESSION STATUS LIKE "bytes_sent"

Is your server well-resourced? Is memory allocation well configured?

My guess is that when you are using PHPMyAdmin you are getting paginated results. This masks the issue of MySQL returning more data than your server can handle (I don't use Navicat, not sure about how that returns results).

Perhaps the Python process is memory-constrained and when faced with this large result set it has to out page out to disk to handle the result set.

If you reduce the number of columns called and/or constrain to, say LIMIT 10 on your query do you get improved speed?

Can you see if the server running Python is paging to disk when this query is called? Can you see what memory is allocated to Python, how much is used during the process and how that allocation and usage compares to those same values in the PHP version?

Can you allocate more memory to your constrained resource?

Can you reduce the number of columns or rows that are called through pagination or asynchronous loading?

Sign up to request clarification or add additional context in comments.

6 Comments

Thanks for the feedback. In terms of the setup... brand new machine, with 64g RAM, heavyweight processor etc. I will try and reduce the result count, but regardless is feels like a total disaster to be running this slow. I will report back. Thanks!
Are MySQL and Python on the same server?
Are/were MySQL and PHP on the same server?
Yeah, they're all on the same machine. Pretty salty machine too...64g ram etc.
Any new report on the questions posed in my answer?
|
1

I know this is late, however, I have run into similar issues with mysql and python. My solution is to use queries using another language...I use R to make my queries which is blindly fast, do what I can in R and then send the data to python if need be for more general programming, although R has many general purpose libraries as well. Just wanted to post something that may help someone who has a similar problem, and I know this side steps the heart of the problem.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.