1

I use Python 2.7. This page says that:

Python’s default encoding is the ‘ascii’ encoding

Indeed I have the following:

>>> import sys
>>> sys.getdefaultencoding()
'ascii'

But I open my interpreter and type this:

>>> 'É'
'\xc3\x89'

It looks like utf8:

>>> u'É'.encode( 'utf8' )
'\xc3\x89'

What happened? Did the default ascii raise UnicodeEncodeError? Did it trigger utf8 encoding?

1 Answer 1

3

Your terminal is configured to use UTF-8. It sends UTF-8 data to Python. Python stored that data in a bytestring.

When you then print that bytestring, the terminal interprets those bytes as UTF-8 again.

An no point is Python actually interpreting these bytes as anything other than raw bytes, no decoding or encoding takes place on the Python level.

If you were trying to decode the bytes implicitly an exception would be thrown:

>>> unicode('\xc3\x89')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 0: ordinal not in range(128)

Here Python used sys.getdefaultencoding() and the decoding failed.

For stdin input into the interactive prompt to create Unicode literals (using u'...'), Python does not use sys.getdefaultencoding() but the sys.stdin.encoding value:

>>> import sys
>>> sys.stdin.encoding
'UTF-8'

which Python takes either from the PYTHIONIOENCODING environment variable (if set), or from locale.getpreferredencoding():

>>> import locale
>>> locale.getpreferredencoding()
'UTF-8'

When reading Python source files, Python 2 would use ASCII to interpret such literals, Python 3 would use UTF-8. Both can be told about what codec to use instead using the PEP 263 source encoding comment, which has to be on the first or second line of your input file:

# coding: UTF-8
Sign up to request clarification or add additional context in comments.

1 Comment

note: there is os.device_encoding(fd) in Python 3. It is worth mentioning it explicitely: changing source-code encoding has no effect on sys.getdefaultencoding(). REPL uses locale.getpreferredencoding() (that is why non-ascii inside literals may work by default without the encoding declaration) that may be different from sys.stdin.encoding

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.