How to write utf8 to standard output in a way that works with python2 and python3

Question

I want to write a non-ascii character, lets say → to standard output. The tricky part seems to be that some of the data that I want to concatenate to that string is read from json. Consider the follwing simple json document:

{"foo":"bar"}

I include this because if I just want to print → then it seems enough to simply write:

print("→")

and it will do the right thing in python2 and python3.

So I want to print the value of foo together with my non-ascii character →. The only way I found to do this such that it works in both, python2 and python3 is:

getattr(sys.stdout, 'buffer', sys.stdout).write(data["foo"].encode("utf8")+u"→".encode("utf8"))

or

getattr(sys.stdout, 'buffer', sys.stdout).write((data["foo"]+u"→").encode("utf8"))

It is important to not miss the u in front of → because otherwise a UnicodeDecodeError will be thrown by python2.

Using the print function like this:

print((data["foo"]+u"→").encode("utf8"), file=(getattr(sys.stdout, 'buffer', sys.stdout)))

doesnt seem to work because python3 will complain TypeError: 'str' does not support the buffer interface.

Did I find the best way or is there a better option? Can I make the print function work?

For your last example that calls print, in Python 3 encoding the string returns bytes. Since print requires a string, it calls the __str__ method, which for bytes just returns a repr, i.e. str("→".encode()) == "b'\\xe2\\x86\\x92'". Next print writes this useless repr to the file, but the BufferedWriter requires an object that supports the buffer interface, such as bytes. — Eryk Sun
– Eryk Sun, Commented May 30, 2014 at 2:13
@eryksun thank you! As print() is able to print all kinds of datatypes without explicit conversion to str I didnt think it would choke on bytes. — josch
– josch, Commented May 30, 2014 at 6:37
Printing has to first get an object as a string. This doesn't choke on Python 3 bytes. Decoding bytes using a default encoding would be wrong in general, since a bytes object isn't necessarily text. I just meant the repr string is "useless" for your needs. What choked is trying to print to a BufferedWriter, e.g. print('abc', file=sys.stdout.buffer). — Eryk Sun
– Eryk Sun, Commented May 30, 2014 at 7:16

snapshoe · Accepted Answer · 2014-06-08 08:42:07Z

3

+100

The most concise I could come up with is the following, which you may be able to make more concise with a few convenience functions (or even replacing/overriding the print function):

# -*- coding=utf-8 -*-
import codecs
import os
import sys

# if you include the -*- coding line, you can use this
output = 'bar' + u'→'
# otherwise, use this
output = 'bar' + b'\xe2\x86\x92'.decode('utf-8')

if sys.stdout.encoding == 'UTF-8':
    print(output)
else:
    output += os.linesep
    if sys.version_info[0] >= 3:
        sys.stdout.buffer.write(bytes(output.encode('utf-8')))
    else:
        codecs.getwriter('utf-8')(sys.stdout).write(output)

The best option is using the -*- encoding line, which allows you to use the actual character in the file. But if for some reason, you can't use the encoding line, it's still possible to accomplish without it.

This (both with and without the encoding line) works on Linux (Arch) with python 2.7.7 and 3.4.1. It also works if the terminal's encoding is not UTF-8. (On Arch Linux, I just change the encoding by using a different LANG environment variable.)

LANG=zh_CN python test.py

It also sort of works on Windows, which I tried with 2.6, 2.7, 3.3, and 3.4. By sort of, I mean I could get the '→' character to display only on a mintty terminal. On a cmd terminal, that character would display as 'ΓåÆ'. (There may be something simple I'm missing there.)

answered Jun 8, 2014 at 8:42

snapshoe

14.5k2 gold badges28 silver badges29 bronze badges

Sign up to request clarification or add additional context in comments.

8 Comments

ohmu Over a year ago

With regard to Windows only sort of working, would changing 'utf-8' to sys.stdout.encoding print any better?

snapshoe Over a year ago

No. That would be the same as simply doing a print. If you're not changing the encoding, sys.stdout.encoding is the one it uses, which is why all the work to change it from it's default.

snapshoe Over a year ago

As an experiment, try the code here. It will show the effect of the encoding used on a terminal for all available encodings-- for ones that don't throw exceptions. I ran this on Windows & Linux, 2.7 & 3.4.

Martijn Pieters Over a year ago

I cannot stress enough how important it is to ensure your terminal or console is correctly configured. It should not be Python's job to ensure this. Personally, I'd use output = output.encode('utf-8'), try:, sys.stdout.buffer.write(output), except AttributeError:, sys.stdout.write(output); codecs.getwriter() is overkill here, and you need to test for features, not versions. You can use the io module in Python 2 as well so sys.stdout could actually have the .buffer attribute there too.

snapshoe Over a year ago

@MartijnPieters Is there a tutorial or reference on how to correctly configure a console/terminal (cmd/powershell/other?) on Windows?

|

Addison · Accepted Answer · 2014-06-07 08:16:15Z

1

If you don't need to print to sys.stdout.buffer, then the following should print fine to sys.stdout. I tried it in both Python 2.7 and 3.4, and it seemed to work fine:

# -*- coding=utf-8 -*-
print("bar" + u"→")

answered Jun 7, 2014 at 8:16

Addison

1,07512 silver badges17 bronze badges

3 Comments

snapshoe Over a year ago

This does not work if sys.stdout.encoding != "UTF-8", such as on Windows.

rds Over a year ago

@snapshoe It is obvious that it will not be displayed properly if the output goes to something with limited capabilities. But Python does write to the output in UTF-8, and the OP wanted to send the output in a file, it seems.

snapshoe Over a year ago

@rds I don't see any mention of outputting to a file. I do see mentioned everywhere, including the title of the post, about printing to stdout.

Collectives™ on Stack Overflow

How to write utf8 to standard output in a way that works with python2 and python3

2 Answers 2

8 Comments

3 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

8 Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related