If I do
url = "http://example.com?p=" + urllib.quote(query)
- It doesn't encode
/to%2F(breaks OAuth normalization) - It doesn't handle Unicode (it throws an exception)
Is there a better library?
If I do
url = "http://example.com?p=" + urllib.quote(query)
/ to %2F (breaks OAuth normalization)Is there a better library?
From the Python 3 documentation:
urllib.parse.quote(string, safe='/', encoding=None, errors=None)
Replace special characters in string using the
%xxescape. Letters, digits, and the characters'_.-~'are never quoted. By default, this function is intended for quoting the path section of a URL. The optional safe parameter specifies additional ASCII characters that should not be quoted — its default value is'/'.
That means passing '' for safe will solve your first issue:
>>> import urllib.parse
>>> urllib.parse.quote('/test')
'/test'
>>> urllib.parse.quote('/test', safe='')
'%2Ftest'
(The function quote was moved from urllib to urllib.parse in Python 3.)
By the way, have a look at urlencode.
About the second issue, there was a bug report about it and it was fixed in Python 3.
For Python 2, you can work around it by encoding as UTF-8 like this:
>>> query = urllib.quote(u"Müller".encode('utf8'))
>>> print urllib.unquote(query).decode('utf8')
Müller
reserved = ";" | "/" | "?" | ":" | "@" | "&" | "=" | "+" | "$" | "," Which is what urllib.quote is dealing with.urllib.parse.quote docsurllib.parse.quote('http://example.com/some path/').replace('%3A', ':')urllib.parse.quote(url, safe=':/'). Even better, encode some path, then join strings. This is Python, not PHP.In Python 3, urllib.quote has been moved to urllib.parse.quote, and it does handle Unicode by default.
>>> from urllib.parse import quote
>>> quote('/test')
'/test'
>>> quote('/test', safe='')
'%2Ftest'
>>> quote('/El Niño/')
'/El%20Ni%C3%B1o/'
quote is rather vague as a global. It might be nicer to use something like urlencode: from urllib.parse import quote as urlencode.urlencode in urllib.parse already that does something completely different, so you'd be better off picking another name or risk seriously confusing future readers of your code.quote is "rather vague". rather than rename the variable/object to something else you can leave the name fully qualified as urllib.parse.quote. leaving it fully qualified does two things: takes a little extra time typing and saves time reading and maintaining the code. )I think module requests is much better. It's based on urllib3.
You can try this:
>>> from requests.utils import quote
>>> quote('/test')
'/test'
>>> quote('/test', safe='')
'%2Ftest'
My answer is similar to Paolo's answer.
requests.utils.quote is a thin compatibility wrapper to urllib.quote for python 2 and urllib.parse.quote for python 3If you're using Django, you can use urlquote:
>>> from django.utils.http import urlquote
>>> urlquote(u"Müller")
u'M%C3%BCller'
Note that changes to Python mean that this is now a legacy wrapper. From the Django 2.1 source code for django.utils.http:
A legacy compatibility wrapper to Python's urllib.parse.quote() function.
(was used for unicode handling on Python 2)
It is better to use urlencode here. There isn't much difference for a single parameter, but, IMHO, it makes the code clearer. (It looks confusing to see a function quote_plus! - especially those coming from other languages.)
In [21]: query='lskdfj/sdfkjdf/ksdfj skfj'
In [22]: val=34
In [23]: from urllib.parse import urlencode
In [24]: encoded = urlencode(dict(p=query,val=val))
In [25]: print(f"http://example.com?{encoded}")
http://example.com?p=lskdfj%2Fsdfkjdf%2Fksdfj+skfj&val=34
An alternative method using furl:
import furl
url = "https://httpbin.org/get?hello,world"
print(url)
url = furl.furl(url).url
print(url)
Output:
https://httpbin.org/get?hello,world
https://httpbin.org/get?hello%2Cworld