I'm trying to find a generic solution to print unicode strings from a python script.
The requirements are that it must run in both python 2.7 and 3.x, on any platform, and with any terminal settings and environment variables (e.g. LANG=C or LANG=en_US.UTF-8).
The python print function automatically tries to encode to the terminal encoding when printing, but if the terminal encoding is ascii it fails.
For example, the following works when the environment "LANG=enUS.UTF-8":
x = u'\xea'
print(x)
But it fails in python 2.7 when "LANG=C":
UnicodeEncodeError: 'ascii' codec can't encode character u'\xea' in position 0: ordinal not in range(128)
The following works regardless of the LANG setting, but would not properly show unicode characters if the terminal was using a different unicode encoding:
print(x.encode('utf-8'))
The desired behavior would be to always show unicode in the terminal if it is possible and show some encoding if the terminal does not support unicode. For example, the output would be UTF-8 encoded if the terminal only supported ascii. Basically, the goal is to do the same thing as the python print function when it works, but in the cases where the print function fails, use some default encoding.
Clocale) should present the user with a reasonable default, such as UTF-8, not with aUnicodeEncodeErrorand a traceback. The former can produce garbage at worst (but will do what the user wants on all modern systems), whereas the latter is bound to frustrate the user.