Encoding string to Windows-1252 URL format in Python 3 [duplicate]

Question

I want to represent all characters in a string as in this table.

But when I do

raw = 'æøå'
encoded = raw.encode('cp1252')
print(encoded)

I get

>>> b'\xe6\xf8\xe5'

What I want is

>>> %E6%F8%E5

as a string for use in a URL.

There's no such thing. 1252 is the Latin codepage. URLs though have their own encoding, unrelated to codepages. You are asking how to URL-encode that string. — Panagiotis Kanavos
– Panagiotis Kanavos, Commented Nov 12, 2018 at 11:13
@PanagiotisKanavos: Latin-1 is a different standard. CP-1252 differs from that standard, don't equate the two. You are completely right about this not being CP1252 encoded output, of course. — Martijn Pieters
– Martijn Pieters, Commented Nov 12, 2018 at 11:18
@MartijnPieters yes, I know but I'm tired of writing an entire article to describe encodings in comments for the Nth time. The OP is still asking the wrong thing, confusing character codepages for URL encoding — Panagiotis Kanavos
– Panagiotis Kanavos, Commented Nov 12, 2018 at 11:18
@PanagiotisKanavos: absolutely. And urllib.parse.quote() takes care of encoding for you. — Martijn Pieters
– Martijn Pieters, Commented Nov 12, 2018 at 11:20

Antwane · Accepted Answer · 2018-11-12 11:21:39Z

3

You have to "quote" your string using urllib tools.

import urllib.parse

raw = 'æøå'
print(urllib.parse.quote(raw, encoding='cp1252'))
# returns "%E6%F8%E5"

answered Nov 12, 2018 at 11:14

Antwane

23.2k8 gold badges56 silver badges91 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

You do not need to encode separately. urlib.parse.quote() takes an encoding argument directly.

Use urllib.parse.quote(raw, encoding='cp1252'), skip the raw.encode() call altogether.

@MartijnPieters Thanks for the information, I just discovered this. Answer updated

Sometimes you do need to handle bytes, at which point I'd recommend you use urllib.parse.quote_from_bytes(), to be explicit about what is being done. Also, the OP probably wants to use quote_plus(), not quote(), as the vast majority of these want the application/x-www-form-urlencoded content type variant.