0

I'm working on a Python application and having some problems handling strings.

There is this string "She’s Out of My League" (without quotes). I stored it in a variable and tried to insert it into an sqlite3 database. But, I get this error:

sqlite3.ProgrammingError: You must not use 8-bit bytestrings unless you use a text_factory that can interpret 8-bit bytestrings (like text_factory = str). It is highly recommended that you instead just switch your application to Unicode strings.

So, I tried to convert the string to unicode. I tried both of these:

new_str = unicode(old_str)
new_str = old_str.encode("utf8")

But this gives me another error:

UnicodeDecodeError: 'utf8' codec can't decode byte 0x92 in position 49: unexpected code byte

I'm stuck here. What am I doing wrong ?

2
  • Try .decode instead of .encode. Commented May 24, 2011 at 19:23
  • You want old_str.decode(encoding), and you don't need (in fact, you can't) to encode it back to a bytestring for use with sqlite, sqlite requires unicode. Commented May 24, 2011 at 20:13

1 Answer 1

1

Simple. You're assuming that it's UTF-8.

>>> print 'She\x92s Out of My League'.decode('cp1252')
She’s Out of My League
Sign up to request clarification or add additional context in comments.

5 Comments

So, will cp1252 work with all? I'm dealing with filenames here. Filenames both on Windows and Unix.
Ya, I get that. I want something to work with all the characters allowed in a filename. Which one do I choose ?
There isn't any one encoding you can use, unless you force the encoding input into your software. Have fun!
sys.getfilesystemencoding() returns a guess about the filesystem encoding of the current system, and all path functions (e.g. os.path.join, os.listdir) would return unicode (using this guessed encoding) if you give them unicode arguments. Also if you're using cp1252 on a Unix system, you might consider switching to utf8 to avoid bigger issues.
Always use Unicode strings for filenames (and probably for everything else except raw byte arrays without textual interpretation). Then Unicode file names will be handled correctly for both Windows and Unix-like systems.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.