2

I have strings encoded in the following form: La+Cit%C3%A9+De+la+West that I stored in a SQLite VARCHAR field in python.

These are apparently UTF-8 encoded binary strings converted to urlencoded strings. The question is how to convert it back to a unicode string. s = 'La+Cit%C3%A9+De+la+West'

I used the urllib.unquote_plus( s ) python function but it doesn't convert the %C3%A9 into a unicode char. I see this 'La Cité De la West' instead of the expected 'La Cité De la West'.

I'm running my code on Ubuntu, not windows and encoding is UTF-8.

1 Answer 1

6

As we discussed, it looks like the problem was that you were starting with a unicode object, not a string. You want a string:

>>> import urllib
>>> s1 = u'La+Cit%C3%A9+De+la+West'
>>> type(s1)
<type 'unicode'>
>>> print urllib.unquote_plus(s1)
La Cité De la West

>>> s2 = str(s1)
>>> type(s2)
<type 'str'>
>>> print urllib.unquote_plus(s2)
La Cité De la West

>>> import sys
>>> sys.stdout.encoding
'UTF-8'
Sign up to request clarification or add additional context in comments.

4 Comments

Your example works the same way for me. Is it the fact that the string is extracted from a SQLite database from a VARCHAR field ?
type(s) returns str with your example. With s = u"La+Cit%C3%A9+De+la+West", type(s) returns unicode and print unquote_plus(s) returns 'La Cité De la West'. The problem is thus the initial type of s. using print unquote_plus(str(s)) solves my problem. !!
much better. I did a few edits to make it more clear and with the import statement for inexperienced programmers.
The problem was because SQLite returns CHAR arrays as unicode strings and not str strings. The use of str() prior to call the unquote_plus() solved my problem.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.