21

I'm using Windows and Linux machines for the same project. The default encoding for stdin on Windows is cp1252, and on Linux it is utf-8.

I would like to change everything to utf-8. Is it possible? How can I do it?

This question is about Python 2; for Python 3, see Python 3: How to specify stdin encoding

4 Answers 4

19

You can do this by not relying on the implicit encoding when printing things. Not relying on that is a good idea in any case -- the implicit encoding is only used when printing to stdout and when stdout is connected to a terminal.

A better approach is to use unicode everywhere, and use codecs.open or codecs.getwriter everywhere. You wrap sys.stdout in an object that automatically encodes your unicode strings into UTF-8 using, for example:

sys.stdout = codecs.getwriter('utf-8')(sys.stdout)

This will only work if you use unicode everywhere, though. So, use unicode everywhere. Really, everywhere.

Sign up to request clarification or add additional context in comments.

4 Comments

stdin isn't decoded automatically, so you always have to do this yourself. And assuming the input is UTF-8 is probably a bad idea, but there's codecs.getreader('utf-8')(sys.stdin) if you really want to.
Note that in contrast to Python 2, Python 3 actually automatically decodes stdin: docs.python.org/3/library/sys.html#sys.stdin -- this behavior can be changed as outlined in the docs.
Is there any way in Python 3 to forcibly change the encoding of STDIN regardless of the environment variables?
In Python 3.8 codecs.getreader('utf-8')(sys.stdin) does not work. Use codecs.getreader('utf-8')(sys.stdin.buffer) and codecs.getwriter('utf8')(sys.stdout.buffer) instead.
18

This is an old question, but just for reference.

To read UTF-8 from stdin, use:

UTF8Reader = codecs.getreader('utf8')
sys.stdin = UTF8Reader(sys.stdin)

# Then, e.g.:
for _ in sys.stdin:
    print _.strip()

To write UTF-8 to stdout, use:

UTF8Writer = codecs.getwriter('utf8')
sys.stdout = UTF8Writer(sys.stdout)

# Then, e.g.:
print 'Anything'

1 Comment

In Python 3.8 codecs.getreader('utf-8')(sys.stdin) (equivalent to this post) does not work. Use codecs.getreader('utf-8')(sys.stdin.buffer) and codecs.getwriter('utf8')(sys.stdout.buffer) instead.
10

Python automatically detects the encoding of stdin. The simplest way I have found to specify an encoding when automatic detection isn't working properly is to use the PYTHONIOENCODING environment variable, as in the following example:

pipeline | PYTHONIOENCODING="UTF-8" /path/to/your-script.py

For more information about encoding detection and this variable on different platforms you can look at the sys.stdin documentation.

Comments

0

A simple code snippet I used, which works for me on ubuntu: python2.7 and python3.6

from sys import version_info
if version_info.major == 2:  # for python2
    import codecs
    # for stdin
    UTF8Reader = codecs.getreader('utf8')
    sys.stdin = UTF8Reader(sys.stdin)
    # for stdout
    UTF8Writer = codecs.getwriter('utf8')
    sys.stdout = UTF8Writer(sys.stdout)
elif version_info.major == 3:  # for python3
    import codecs
    # for stdin
    UTF8Reader = codecs.getreader('utf8')
    sys.stdin = UTF8Reader(sys.stdin.buffer)
    # for stdout
    UTF8Writer = codecs.getwriter('utf8')
    sys.stdout = UTF8Writer(sys.stdout.buffer)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.