2

I'm having trouble encoding accented characters in a URL using the python command line. Reducing my problem to the essential, this code:

>>> import urllib
>>> print urllib.urlencode({'foo' : raw_input('> ')})
> áéíóúñ

prints this in a mac command line:

foo=%C3%A1%C3%A9%C3%AD%C3%B3%C3%BA%C3%B1

but the same code prints this in windows' command line:

foo=%A0%82%A1%A2%A3%A4

The mac result is correct and the characters get encoded as needed; but in windows I get a bunch of gibberish.

I'm guessing the problem lies in the way windows encodes characters, but I haven't been able to find a solution; I'd be very grateful if you could help me. Thanks in advance!

2 Answers 2

3

You can use explicit encoding to get consistent result.

>>> str = u"áéíóúñ"
>>> import urllib
>>> urllib.urlencode({'foo':str.encode('utf-8')})
'foo=%C3%A1%C3%A9%C3%AD%C3%B3%C3%BA%C3%B1'

However you need to ensure your string is in unicode first, so it may require decoding if its not, like raw_input().decode('latin1') or raw_input().decode('utf-8')

Input encoding depends on the locale of console, I believe, so its system-specific.

EDIT: unicode(str) should use locale encoding too to convert to unicode, so that could be a solution.

Sign up to request clarification or add additional context in comments.

Comments

2

The Windows command line uses cp437 encoding in US Windows. You need utf-8:

>>> import sys
>>> sys.stdin.encoding
'cp437'
>>> print urllib.urlencode({'foo':raw_input('> ').decode('cp437').encode('utf8')})
> áéíóúñ
foo=%C3%A1%C3%A9%C3%AD%C3%B3%C3%BA%C3%B1

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.