0

I have a file with the following two strings:

25_%D1%80%D0%B0%D1%88%D3%99%D0%B0%D1%80%D0%B0
25_\xD1\x80\xD0\xB0\xD1\x88\xD3\x99\xD0\xB0\xD1\x80\xD0\xB0

They both represent the same URL path, and therefore should be equal. I would like to apply the same "cleaning function" to both of them, obtaining the same string.

After reading these strings from the file I have:

>> s0
'25_%D1%80%D0%B0%D1%88%D3%99%D0%B0%D1%80%D0%B0'
>> s1
'2_\\xD1\\x80\\xD0\\xB0\\xD1\\x88\\xD3\\x99\\xD0\\xB0\\xD1\\x80\\xD0\\xB0'

(note the escaped backslashes in s1). If I unquote s0 I get the following:

>> import urllib
>> t0 = urllib.unquote(s0)
'25_\xd1\x80\xd0\xb0\xd1\x88\xd3\x99\xd0\xb0\xd1\x80\xd0\xb0'
>> print t0
25_рашәара

which is good. However, the only thing I know to do on s1 is the following:

>> t1 = s1.decode("unicode_escape")
u'2_\xd1\x80\xd0\xb0\xd1\x88\xd3\x99\xd0\xb0\xd1\x80\xd0\xb0'
>> print t1
2_ÑаÑÓаÑ

which looks broken. My question is: what clean(s) function could be written to normalize these two strings, so they either are both <type 'str'> or both <type 'unicode'> and the both print equally (and compare equally as well)?

1 Answer 1

2

Consider:

>>> s0 = '25_%D1%80%D0%B0%D1%88%D3%99%D0%B0%D1%80%D0%B0'
>>> s1 = '25_\\xD1\\x80\\xD0\\xB0\\xD1\\x88\\xD3\\x99\\xD0\\xB0\\xD1\\x80\\xD0\\xB0'
>>> import urllib
>>> t0 = urllib.unquote(s0).decode('utf8')
>>> t1 = s1.decode('string_escape').decode('utf8')
>>> print t0
25_рашәара
>>> print t1
25_рашәара
>>> t0 == t1
True
>>> 
Sign up to request clarification or add additional context in comments.

2 Comments

This works! I actually tried s1.decode('unicode_escape'), which didn't work. Would you give me a quick pointer as to the difference between string_escape and unicode_escape?
@Gonzalo: string_escape is for strings, unicode_escape for unicodes ;)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.