-1

I want to represent all characters in a string as in this table.

But when I do

raw = 'æøå'
encoded = raw.encode('cp1252')
print(encoded)

I get

>>> b'\xe6\xf8\xe5'

What I want is

>>> %E6%F8%E5

as a string for use in a URL.

4
  • 1
    There's no such thing. 1252 is the Latin codepage. URLs though have their own encoding, unrelated to codepages. You are asking how to URL-encode that string. Commented Nov 12, 2018 at 11:13
  • 1
    @PanagiotisKanavos: Latin-1 is a different standard. CP-1252 differs from that standard, don't equate the two. You are completely right about this not being CP1252 encoded output, of course. Commented Nov 12, 2018 at 11:18
  • @MartijnPieters yes, I know but I'm tired of writing an entire article to describe encodings in comments for the Nth time. The OP is still asking the wrong thing, confusing character codepages for URL encoding Commented Nov 12, 2018 at 11:18
  • @PanagiotisKanavos: absolutely. And urllib.parse.quote() takes care of encoding for you. Commented Nov 12, 2018 at 11:20

1 Answer 1

3

You have to "quote" your string using urllib tools.

import urllib.parse

raw = 'æøå'
print(urllib.parse.quote(raw, encoding='cp1252'))
# returns "%E6%F8%E5"
Sign up to request clarification or add additional context in comments.

4 Comments

You do not need to encode separately. urlib.parse.quote() takes an encoding argument directly.
Use urllib.parse.quote(raw, encoding='cp1252'), skip the raw.encode() call altogether.
@MartijnPieters Thanks for the information, I just discovered this. Answer updated
Sometimes you do need to handle bytes, at which point I'd recommend you use urllib.parse.quote_from_bytes(), to be explicit about what is being done. Also, the OP probably wants to use quote_plus(), not quote(), as the vast majority of these want the application/x-www-form-urlencoded content type variant.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.