how to encode url in python

Question

I have created a function for decoding url.

from urllib import unquote

def unquote_u(source):
  result = source
  if '%u' in result:
    result = result.replace('%u','\\u').decode('unicode_escape')
  result = unquote(result)
  print result
  return result

if __name__=='__main__':
    unquote_u('{%22%22%3A%22test_%E5%93%A6%E4%BA%88%E4%BB%A5%E8%85%BF%E5%93%A6.doc.txt%22%2C%22mimeType%22%3A%22text%2Fplain%22%2C%22compressed%22%3Afalse%7D')

But, I am not ale to get proper file name. proper file name is : test_哦予以腿哦.doc

can anyone tell me how to do that?

why downvote the question? the guy is making an effort, and he asked the question clearly. +1 just to counteract one of the downvotes — jcomeau_ictx
– jcomeau_ictx, Commented Jun 30, 2011 at 5:51

jcomeau_ictx · Accepted Answer · 2011-06-30 12:46:01Z

5

urllib.unquote can do it:

>>> urllib.unquote('{%22%22%3A%22test_%E5%93%A6%E4%BA%88%E4%BB%A5%E8%85%BF%E5%93%A6.doc.txt%22%2C%22mimeType%22%3A%22text%2Fplain%22%2C%22compressed%22%3AFalse%7D')
'{"":"test_\xe5\x93\xa6\xe4\xba\x88\xe4\xbb\xa5\xe8\x85\xbf\xe5\x93\xa6.doc.txt","mimeType":"text/plain","compressed":False}'
>>> eval(_)
{'': 'test_\xe5\x93\xa6\xe4\xba\x88\xe4\xbb\xa5\xe8\x85\xbf\xe5\x93\xa6.doc.txt', 'mimeType': 'text/plain', 'compressed': False}
>>> _['']
'test_\xe5\x93\xa6\xe4\xba\x88\xe4\xbb\xa5\xe8\x85\xbf\xe5\x93\xa6.doc.txt'
>>> print _
test_哦予以腿哦.doc.txt

Note that I had to change "false" to "False" in the quoted string. Also that the string after unquote is still UTF-8 encoded; you can use str.decode('utf8') to get a Unicode string if that is what you require.

As JBernardo mentions, eval() of unsafe data is a very bad idea. Anybody knowing, or even suspecting, that a server-side script is eval()-ing form data can easily craft a POST with commands that can compromise the server. Better would be this:

>>> import json, urllib
>>> json.loads(urllib.unquote('{%22%22%3A%22test_%E5%93%A6%E4%BA%88%E4%BB%A5%E8%85%BF%E5%93%A6.doc.txt%22%2C%22mimeType%22%3A%22text%2Fplain%22%2C%22compressed%22%3Afalse%7D'))['']
u'test_\u54e6\u4e88\u4ee5\u817f\u54e6.doc.txt'
>>> print _
test_哦予以腿哦.doc.txt

Also note that this later approach didn't require changing false to False; in fact it doesn't work if I do. The json package takes care of that.

edited Jun 30, 2011 at 12:46

answered Jun 30, 2011 at 5:45

jcomeau_ictx

38.6k7 gold badges98 silver badges114 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

JBernardo Over a year ago

ast.literal_eval(_) or eval(_, {}) would be better

Ryan Ye · Accepted Answer · 2011-06-30 05:56:18Z

1

One thing to add, after get unquoted url from urllib.unquote(url), you probably need use decode('utf8') to convert the raw string into a unicode string.

answered Jun 30, 2011 at 5:56

Ryan Ye

3,2691 gold badge25 silver badges26 bronze badges

Collectives™ on Stack Overflow

how to encode url in python

2 Answers 2

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related