2

I'm trying to hand off data from Pandas (imported from a CSV) to a MySQL database via Python's MySQLdb library. I'm having trouble when literal backslashes come into play. I escape the single backslashes from the raw input, so Python knows they're a literal backslash, not an escape on the subsequent character. But when I try to execute an INSERT query, MySQLdb says there's a syntax error. But here's the confusing & frustrating part: If I copy/paste the exact string into MySQL, it executes without problem.

I've tried to keep the data & structure as close to possible as the actual data, but changed it to retain privacy. Note that there are two similarly-offending values, at the end of the SourceSystemID column in the first row, and the MiddleInitial column in the second row.

In [39]: test
Out[39]: 

  ehrSystemID SourceSystemID LastName FirstName MiddleInitial  Sex  
0   fakePlace           ABC\      NaN       NaN           NaN  NaN   
1   fakePlace            XYZ    Smith      John             \    M   
          npi  deaNumber LicenseNumber ProvSpecialty  dob  
0  1234567890  AB1234567       !123456      Internal  NaN  
1         NaN        NaN       B123456      Internal  NaN  

The values of these rows converted to a string to append to the end of an INSERT statement (note that all MySQL columns will be varchar, so all values are wrapped in single-quotes)

In [40]: testVals
Out[40]: "('fakePlace', 'ABC\\', '', '', '', '', '1234567890', 'AB1234567', '!123456', 'Internal', ''), ('fakePlace', 'XYZ', 'Smith', 'John', '\\', 'M', '', '', 'B123456', 'Internal', '')"

The command I pass to MySQLdb and resulting error:

In [41]: testCmd1
Out[41]: "INSERT INTO source_providers (ehrSourceID, sourceSystemID, nameLast, nameFirst, nameMiddle, sex, npiRaw, dea, licenseNumber, specialty1, dobRaw) VALUES ('fakePlace', 'ABC\\', '', '', '', '', '1234567890', 'AB1234567', '!123456', 'Internal', ''), ('fakePlace', 'XYZ', 'Smith', 'John', '\\', 'M', '', '', 'B123456', 'Internal', '')"

In [42]: db.Cur.execute(testCmd1)
---------------------------------------------------------------------------
ProgrammingError                          Traceback (most recent call last)
<ipython-input-42-32fe62e740d8> in <module>()
----> 1 db.Cur.execute(testCmd1)

/Library/Python/2.7/site-packages/MySQL_python-1.2.4b4-py2.7-macosx-10.8-intel.egg/MySQLdb/cursors.pyc in execute(self, query, args)
    200             del tb
    201             self.messages.append((exc, value))
--> 202             self.errorhandler(self, exc, value)
    203         self._executed = query
    204         if not self._defer_warnings: self._warning_check()

/Library/Python/2.7/site-packages/MySQL_python-1.2.4b4-py2.7-macosx-10.8-intel.egg/MySQLdb/connections.pyc in defaulterrorhandler(***failed resolving arguments***)
     34     del cursor
     35     del connection
---> 36     raise errorclass, errorvalue
     37 
     38 re_numeric_part = re.compile(r"^(\d+)")

ProgrammingError: (1064, "You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '1234567890', 'AB1234567', '!123456', 'Internal', ''), ('fakePlace', 'XYZ', 'Smit' at line 1")

Successfully executing the exact command in MySQL directly:

mysql> INSERT INTO source_providers (ehrSourceID, sourceSystemID, nameLast, nameFirst, nameMiddle, sex, npiRaw, dea, licenseNumber, specialty1, dobRaw) VALUES ('fakePlace', 'ABC\\', '', '', '', '', '1234567890', 'AB1234567', '!123456', 'Internal', ''), ('fakePlace', 'XYZ', 'Smith', 'John', '\\', 'M', '', '', 'B123456', 'Internal', '');
Query OK, 2 rows affected (0.01 sec)
Records: 2  Duplicates: 0  Warnings: 0

Is there something that happens when the string is passed to the MySQL API? How can I avoid filtering out the offending rows and manually inserting them separately? This involves many tens of thousands of rows, thus my focus on automating the process.

Thanks.

2
  • Did you try double escaping backslashes, i.e., r'\\'? You can do this with re.escape as well. Since the actual query string contains two backslashes you have to do '\\\\' (two) instead of a single backslash '\\'. Commented Aug 29, 2013 at 16:12
  • 1
    Turns out the problem is how Python prints the string, automatically inserting escapes where there are none, so it appeared to be double-slashed, while it was not. I'll provide an Answer to the question with some more detail. Thanks, though, Phillip! Commented Aug 29, 2013 at 17:19

1 Answer 1

5

It turns out the confusion was in how Python printed the query string, combined with a bug in my own code. First, the bug: I incorrectly used str.replace() to replace the single literal backslash with an escaped double:

sqlCmd.replace('\\', '\\\\') 

So when Python showed the double-slash when printing the string, I assumed it had successfully replaced the single-slash. The correct code (in case anyone who reads this makes the same mistake) is:

sqlCmd = sqlCmd.replace('\\', '\\\\')

The second source of confusion was that the Python interpreter uses __repr__() which automatically inserts an escape backslash for displaying but this escape backslash isn't actually in the original string. Python assumes you're smart enough to know this. Turns out I wasn't. ;-)

A short additional explanation of __repr__() and backslashes is available in an answer to another stackoverflow question here.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.