5

I am trying to achieve the following using Python and the MySQLdb interface:

  1. Read the contents of a table that has a few million rows.
  2. Process and modify the output of every row.
  3. Put the modified rows into another table.

It seems sensible to me to iterate over each row, process on-the-fly and then insert each new row into the new table on-the-fly.

This works:

import MySQLdb
import MySQLdb.cursors

conn=MySQLdb.connect(
    host="somehost",user="someuser",
    passwd="somepassword",db="somedb")

cursor1 = conn.cursor(MySQLdb.cursors.Cursor)
query1 = "SELECT * FROM table1"
cursor1.execute(query1)

cursor2 = conn.cursor(MySQLdb.cursors.Cursor)

for row in cursor1:
    values = some_function(row)
    query2 = "INSERT INTO table2 VALUES (%s, %s, %s)"
    cursor2.execute(query2, values)

cursor2.close()
cursor1.close()
conn.commit()
conn.close()

But this is slow and memory-consuming since it's using a client-side cursor for the SELECT query. If I instead use a server-side cursor for the SELECT query:

cursor1 = conn.cursor(MySQLdb.cursors.SSCursor)

Then I get a 2014 error:

Exception _mysql_exceptions.ProgrammingError: (2014, "Commands out of sync; you can't run this command now") in <bound method SSCursor.__del__ of <MySQLdb.cursors.SSCursor object at 0x925d6ec>> ignored

So it doesn't seem to like starting another cursor while iterating over a server-side cursor. Which seems to leave me stuck with a very slow client-side iterator.

Any suggestions?

2
  • 2
    While unrelated to your technical problem (which is that you need to read through the whole result set buffer before starting another query), are you sure you can't rewrite this code as a single INSERT INTO ... SELECT query, where the SELECT performs the logic your some_function was to perform? MySQL can do a lot of things right inside a query. Commented Jan 28, 2011 at 7:08
  • With the table I'm currently working with, I probably could, yes. The larger problem (which I should have stated in the question - always seem to miss something!) is that I will have a lot more of these tables to read from in the future, and I don't know what functions I need to to perform on those yet. I figures that processing in Python rather than MySQL would future-proof me against whatever comes up. Thanks for the comment though - I may have to resort to MySQL processing if necessary. Commented Jan 28, 2011 at 7:17

1 Answer 1

3

You need a seperate connection to the database, since the first connection is stuck with streaming the resultset, you can't run the insert query.

Try this:

import MySQLdb
import MySQLdb.cursors

conn=MySQLdb.connect(
    host="somehost",user="someuser",
    passwd="somepassword",db="somedb")

cursor1 = conn.cursor(MySQLdb.cursors.SSCursor)
query1 = "SELECT * FROM table1"
cursor1.execute(query1)

insertConn=MySQLdb.connect(
    host="somehost",user="someuser",
    passwd="somepassword",db="somedb")
cursor2 = inserConn.cursor(MySQLdb.cursors.Cursor)

for row in cursor1:
    values = some_function(row)
    query2 = "INSERT INTO table2 VALUES (%s, %s, %s)"
    cursor2.execute(query2, values)

cursor2.close()
cursor1.close()
conn.commit()
conn.close()
insertConn.commit()
insertConn.close()
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.