434

If I do

url = "http://example.com?p=" + urllib.quote(query)
  1. It doesn't encode / to %2F (breaks OAuth normalization)
  2. It doesn't handle Unicode (it throws an exception)

Is there a better library?

5
  • 5
    These are not URL parameters, FYI. You should clarify. Commented Sep 7, 2018 at 18:53
  • What is the language-agnostic canonical Stack Overflow question? (That is, only covering the encoding, not how it is achieved.) Commented Nov 27, 2022 at 21:54
  • 1
    @JamieMarshall what should they be called then if not URL parameters? Commented Oct 13, 2023 at 16:32
  • 2
    @BenCreasy- attributes. The specification for URL describes parameters as a separate part of the URL (not involving the query string). reference here I can't tell you how much time I've lost trying to authenticate an API because they were asking for parameters and I was sending query attributes. Commented Nov 2, 2023 at 19:12
  • @JamieMarshall even though "attributes" is the real name, I also found this question searching "parameter" in Google, and I would never think about searching for "url attributes" Commented Oct 11 at 12:32

6 Answers 6

560

From the Python 3 documentation:

urllib.parse.quote(string, safe='/', encoding=None, errors=None)

Replace special characters in string using the %xx escape. Letters, digits, and the characters '_.-~' are never quoted. By default, this function is intended for quoting the path section of a URL. The optional safe parameter specifies additional ASCII characters that should not be quoted — its default value is '/'.

That means passing '' for safe will solve your first issue:

>>> import urllib.parse
>>> urllib.parse.quote('/test')
'/test'
>>> urllib.parse.quote('/test', safe='')
'%2Ftest'

(The function quote was moved from urllib to urllib.parse in Python 3.)

By the way, have a look at urlencode.


About the second issue, there was a bug report about it and it was fixed in Python 3.

For Python 2, you can work around it by encoding as UTF-8 like this:

>>> query = urllib.quote(u"Müller".encode('utf8'))
>>> print urllib.unquote(query).decode('utf8')
Müller
Sign up to request clarification or add additional context in comments.

8 Comments

Thanks you, both worked great. urlencode just calls quoteplus many times in a loop, which isn't the correct normalization for my task (oauth).
the spec: rfc 2396 defines these as reserved reserved = ";" | "/" | "?" | ":" | "@" | "&" | "=" | "+" | "$" | "," Which is what urllib.quote is dealing with.
urllib.parse.quote docs
if you wanna retain the colon from http: , do urllib.parse.quote('http://example.com/some path/').replace('%3A', ':')
@chrizonline Just use urllib.parse.quote(url, safe=':/'). Even better, encode some path, then join strings. This is Python, not PHP.
|
217

In Python 3, urllib.quote has been moved to urllib.parse.quote, and it does handle Unicode by default.

>>> from urllib.parse import quote
>>> quote('/test')
'/test'
>>> quote('/test', safe='')
'%2Ftest'
>>> quote('/El Niño/')
'/El%20Ni%C3%B1o/'

3 Comments

The name quote is rather vague as a global. It might be nicer to use something like urlencode: from urllib.parse import quote as urlencode.
Note that there is a function named urlencode in urllib.parse already that does something completely different, so you'd be better off picking another name or risk seriously confusing future readers of your code.
(style suggestion: @Luc i agree that quote is "rather vague". rather than rename the variable/object to something else you can leave the name fully qualified as urllib.parse.quote. leaving it fully qualified does two things: takes a little extra time typing and saves time reading and maintaining the code. )
65

I think module requests is much better. It's based on urllib3.

You can try this:

>>> from requests.utils import quote
>>> quote('/test')
'/test'
>>> quote('/test', safe='')
'%2Ftest'

My answer is similar to Paolo's answer.

4 Comments

requests.utils.quote is link to python quote. See request sources.
requests.utils.quote is a thin compatibility wrapper to urllib.quote for python 2 and urllib.parse.quote for python 3
without reading the comments, this is creating confusion...
And: why take dependency on an external package, when the functionality is built into Python’s own stdlib?
15

If you're using Django, you can use urlquote:

>>> from django.utils.http import urlquote
>>> urlquote(u"Müller")
u'M%C3%BCller'

Note that changes to Python mean that this is now a legacy wrapper. From the Django 2.1 source code for django.utils.http:

A legacy compatibility wrapper to Python's urllib.parse.quote() function.
(was used for unicode handling on Python 2)

1 Comment

it's deprecated from Django 3.0+
7

It is better to use urlencode here. There isn't much difference for a single parameter, but, IMHO, it makes the code clearer. (It looks confusing to see a function quote_plus! - especially those coming from other languages.)

In [21]: query='lskdfj/sdfkjdf/ksdfj skfj'

In [22]: val=34

In [23]: from urllib.parse import urlencode

In [24]: encoded = urlencode(dict(p=query,val=val))

In [25]: print(f"http://example.com?{encoded}")
http://example.com?p=lskdfj%2Fsdfkjdf%2Fksdfj+skfj&val=34

Documentation

Comments

2

An alternative method using furl:

import furl

url = "https://httpbin.org/get?hello,world"
print(url)
url = furl.furl(url).url
print(url)

Output:

https://httpbin.org/get?hello,world
https://httpbin.org/get?hello%2Cworld

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.