0

I am trying to convert texts into URLs, but certain characters are not being converted as I'm expecting. For example:

>>> import urllib
>>> my_text="City of Liège"
>>> my_url=urllib.parse.quote(my_text,safe='')
>>> my_url
'City%20of%20Li%C3%A8ge'

The spaces get converted properly, however, the "è" should get converted into %E8, but it is returned as %C3%A8. What am I missing ? I am using Python 3.6.

1 Answer 1

2

Your string is UTF-8 encoded, and the URL encoded string reflects this.

0xC3A8 is the UTF-8 encoding of the Unicode value U+00E8, which is described as "LATIN SMALL LETTER E WITH GRAVE".

In order to get the string you are after, you need to let Python know which codepage you're using, like this:

my_text=bytes("City of Liège",'cp1252')
Sign up to request clarification or add additional context in comments.

6 Comments

So how should I change my code in order to get the desired result ?
Try replacing the assignment with this and see what happens: my_text = "City of Liège".decode('utf-8')
I don't have the same version of python as you do, so I can't do the urllib.parse.quote() step to verify that it works. Sorry about that. But what I can do is see that the string prints as u'City of Li\xe8ge', which looks more like what you're after.
I get AttributeError: 'str' object has no attribute 'decode'. I am using Python 3.6.1
Based on your comment, I needed to convert text encoding to into cp1252, like this: my_text=bytes("City of Liège",'cp1252'). If you change your answer to incorporate this, I will accept it.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.