3

I have a mysql db. I set charset to utf8;

...
  PRIMARY KEY  (`username`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 | 
...

I connect to db in python with MySQLdb;

conn = MySQLdb.connect(host = "localhost",
                               passwd = "12345",
                               db = "db",
                               charset = 'utf8',
                               use_unicode=True)

When I execute a query, response is decoding with "windows-1254". Example response;

curr = conn.cursor(MySQLdb.cursors.DictCursor)
select_query = 'SELECT * FROM users'
curr.execute(select_query)

for ret in curr.fetchall():
    username = ret["username"]
    print "repr-username; ", repr(username)
    print "username; "username.encode("utf-8")
...

output is;

repr-username;  u'\xc5\u0178\xc3\xbckr\xc3\xbc\xc3\xa7a\xc4\u0178l\xc3\xbcli'
username;  şükrüçağlüli

When I print username with "windows-1254" it works fine;

...
print "repr-username; ", repr(username)
print "username; ", username.encode("windows-1254")
...

Output is;

repl-username;  u'\xc5\u0178\xc3\xbckr\xc3\xbc\xc3\xa7a\xc4\u0178l\xc3\xbcli'
username;  şükrüçağlüli

When I try it with some other characters like cyrillic alphabet, decodeding is changed dinamicly. How can I prevent it?

7
  • to be clear, "şükrüçağlüli" is the output you want? Commented Aug 28, 2014 at 12:55
  • Yes. This text has some turkish special characters like "şüçğ". Commented Aug 28, 2014 at 12:57
  • Is that the charset of the table as well? Commented Aug 28, 2014 at 13:08
  • 2
    Terminal encoding ? Other idea: could you modify your test case to both INSERT and SELECT from Python. Does the problem persist ? Commented Aug 28, 2014 at 13:12
  • 1
    On my UTF-8 system u"şükrüçağlüli" == u'\u015f\xfckr\xfc\xe7a\u011fl\xfcli'. This is not what you have. Are you certain the data have been properly encoded at INSERT time ? Commented Aug 28, 2014 at 13:19

1 Answer 1

3

I think the items where encoded wrong while INSERT to the database.

I recommend python-ftfy(from https://github.com/LuminosoInsight/python-ftfy) (helped me out in a simillar problem):

import ftfy

username = u'\xc5\u0178\xc3\xbckr\xc3\xbc\xc3\xa7a\xc4\u0178l\xc3\xbcli'
print ftfy.fix_text(username) # outputs şükrüçağlüli
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.