9

I need to call MySQL stored procedure from my python script. As one of parameters I'm passing a unicode string (Russian language), but I get an error;

UnicodeEncodeError: 'latin-1' codec can't encode characters in position 0-1: ordinal not in range(256)

My script:

  self.db=MySQLdb.connect("localhost", "usr", "pass", "dbName")
  self.cursor=self.db.cursor()
  args=("какой-то текст") #this is string in russian
  self.cursor.callproc('pr_MyProc', args)
  self.cursor.execute('SELECT @_pr_MyProc_2') #getting result from sp
  result=self.cursor.fetchone()
  self.db.commit()

I've read that setting charset='utf8' shuld resolve this problem, but when I use string:

self.db=MySQLdb.connect("localhost", "usr", "pass", "dbName", charset='utf8')

This gives me another error;

UnicodeEncodeError: 'utf-8' codec can't encode character '\udcd1' in position 20: surrogates not allowed

Also I've trying to set parametr use_unicode=True, that's not working.

6
  • Does it work if you send the string in unicode: args=(u"какой-то текст")? Commented Aug 23, 2016 at 6:21
  • @DanielRoseman no, I've tryed it too. Commented Aug 23, 2016 at 6:29
  • is this python 2 or 3? Commented Aug 23, 2016 at 17:53
  • try stackoverflow.com/questions/27366479/… Commented Aug 30, 2016 at 5:56
  • what's also strange is that that Cyrillic (?) string can be decoded into UTF-8 without issue. Commented Aug 30, 2016 at 16:17

8 Answers 8

5

More things to check on: http://mysql.rjweb.org/doc.php/charcoll#python

Likely items:

  • Start code file with # -*- coding: utf-8 -*- -- (for literals in code)
  • Literals should be u'...'

Can you extract the HEX? какой-то текст should be this in utf8: D0BA D0B0 D0BA D0BE D0B9 2D D182 D0BE D182 20 D0B5 D0BA D181 D182

Sign up to request clarification or add additional context in comments.

2 Comments

As I understand "# -- coding: utf-8 --" sets encoding for file, not for strings. So it does'not help.
Does it hurt in your situation?
3
+25

Here are some thoughts. Maybe not a response. I've been playing with python/mysql/utf-8/unicode in the past and this is the things i remember:

Looking at Saltstack mysql module's comment :

https://github.com/saltstack/salt/blob/develop/salt/modules/mysql.py#L314-L322

# MySQLdb states that this is required for charset usage
# but in fact it's more than it's internally activated
# when charset is used, activating use_unicode here would
# retrieve utf8 strings as unicode() objects in salt
# and we do not want that.
#_connarg('connection_use_unicode', 'use_unicode')
connargs['use_unicode'] = False
_connarg('connection_charset', 'charset')

We see that to avoid altering the result string the use_unicode is set to False, while the charset (which could be utf-8) is set as a parameter. use_unicode is more a 'request' to get responses as unicode strings.

You can check real usage in the tests, here: https://github.com/saltstack/salt/blob/develop/tests/integration/modules/test_mysql.py#L311-L361 with a database named '標準語'.

Now about the message UnicodeEncodeError: 'utf-8' codec can't encode character '\udcd1' **. You are using **unicode but you tell the module it is utf-8. It is not utf-8 until you encode your unicode string in utf-8.

Maybe you should try with:

args=(u"какой-то текст".encode('utf-8'))

At least in python3 this is required, because your "какой-то текст" is not in utf-8 by default.

1 Comment

Link is now broken: I found the new link: github.com/saltstack/salt/blob/develop/tests/integration/… (Note file_name change)
2

The MySQLdb module is not compatible with python 3. That might be why you are getting problems. I would advise to use a different connector, like PyMySQL or mysqlclient.

Related: 23376103.

1 Comment

I'm using mysqlclinet
2

Maybe you can reload your sys in utf-8 and try to decode the string into utf-8 as following :

import sys
reload(sys)
sys.setdefaultencoding("utf-8")

...

stringUtf8 = u''.join(string_original).decode('utf-8')

Comments

2

I had a similar problem very recently but with PostgreSQL. After trying tons of suggestions from SO/ internet, I realized the issue was with my database. I had to drop my database and reinstall Postgres, because for some reason it was not allowing me to change the database's default collation. I was in a hurry so couldn't find a better solution, but would recommend the same, since I was only starting my application in the deployment environment. All the best.

Comments

1

What's your database's charset?
use :

show variables like "characetr%"; 

or see your database's charset

Comments

1

I see here two problems.

  1. You have unicode but you try to define it as utf-8 by setting parameter "charset". You should first encode your unicode to utf-8 or another encoding system.

  2. If it however doesn't work, try to do so with init_command='SET NAMES UTF8' parameter.

So it will look like:

conn = MySQLdb.connect(charset='utf8', init_command='SET NAMES UTF8')

You can try also this:

cursor = db.cursor()

cursor.execute("SET NAMES UTF8;")

Comments

0

I encountered a similar issue, which was caused by invalid utf-8 data in the database; it seems that MySQL doesn't care about that, but Python does, because it's following the UTF-8 spec, which says that:

  • surrogate pairs are not allowed in utf-8
  • unpaired surrogates are not allowed in utf-8

If you want to "make it work", you'll have to intercept the MySQL packet and use your own converter which will perform ad-hoc replacements.

Here's one way to "handle" invalid data containing surrogates:

def borked_utf8_decode(data):
    """
    Work around input with unpaired surrogates or surrogate pairs,
    replacing by XML char refs: look for "&#\d+;" after.
    """
    return data.decode("utf-8", "surrogatepass") \
     .encode("utf-8", "xmlcharrefreplace") \
     .decode("utf-8")

Note that the proper way to handle that is context-dependent, but there are some common replacement scenarios, like this one.

And here's one way of plugging this into pymysql (another way is to monkey-patch field processing, see eg. https://github.com/PyMySQL/PyMySQL/issues/631):

import pymysql.converters

# use this in your connection
pymysql_use_unicode = False
conversions = pymysql.converters.conversions
conversions[pymysql.converters.FIELD_TYPE.STRING] = borked_utf8_decode
conversions[pymysql.converters.FIELD_TYPE.VAR_STRING] = borked_utf8_decode
conversions[pymysql.converters.FIELD_TYPE.VARCHAR] = borked_utf8_decode

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.