7

I've just started to learn Python but I already ran into troubles.
I have a simple script with just one command:

#!/usr/bin/env python3
print("Příliš žluťoučký kůň úpěl ďábelské ódy.") # Text in Czech 

When I try to run this script:

python3 hello.py 

I get this message:

Traceback (most recent call last):
  File "hello.py", line 2, in <module>
    print("P\u0159\xedli\u0161 \u017elu\u0165ou\u010dk\xfd k\u016fn \xfap\u011bl \u010f\xe1belsk\xe9 \xf3dy.")
UnicodeEncodeError: 'ascii' codec can't encode characters in position 1-2: ordinal not in range(128)

I am using Kubuntu 16.04 and Python 3.5.2. When I tried this: export PYTHONIOENCODING=utf-8 It worked but only temporarily. Next time I opened bash I got the same error.

According to https://docs.python.org/3/howto/unicode.html#the-string-type the default encoding for Python source code is UTF-8.
So I have the source file saved id UTF-8, Konsole is set to UTF-8 but I still get the error!
Even if I add

# -*- coding: utf-8 -*-

to the beginning it does nothing.

Another weird thing: when I run it using only python, not python3, it works. How is it possible to work in Python 2.7.12 and not in 3.5.2?

Any ideas for solving this permanently? Thank you.

5
  • 1
    It sounds like your environment isn't configured correctly for UTF-8. That's why Python is defaulting to ascii when printing Unicode. Commented Jan 1, 2017 at 0:18
  • Possible duplicate of UnicodeEncodeError when writing to file Commented Jan 2, 2017 at 10:27
  • Your locale must be broken. Perhaps your .bashrc sets LANG=cs_CZ.UTF-8 but you've not built/installed the Czech locale? Python will default to ASCII encoding if your locale is broken or missing. The reason it works in Python 2 is because the string is a byte string and will simply be written directly to your terminal. Python 3 will need to encode strings when writing to the terminal Commented Jan 2, 2017 at 10:33
  • Thanks to @AlastairMcCormack for suggesting where the problem may be. The problem was really there. The LANG was set to C which is the default setting that uses ANSI. Only few LC_*** were set to cs_CZ.UTF-8 and the other ones inherited the C from LANG. I added these lines to /etc/default/locale/: LANG=cs_CZ.UTF-8 LANGUAGE=cs_CZ.UTF-8 LC_ALL=cs_CZ.UTF-8 It works! Now why I am writing this as a comment and not as an answer. The output to locale now is cs_CZ.UTF-8 everywhere except for LANG. Why can't I set this variable? Commented Jan 2, 2017 at 11:18
  • @Smety I'm glad it worked. You only need to set LANG in /etc/default/locale. Only configure the things like LANGUAGE if you want a specific exception, like having English error messages. Once set and you've restarted your session, then each LC_ should be the same. Check that LANG isn't being set in /etc/environment or your personal shell files. See help.ubuntu.com/community/Locale Commented Jan 2, 2017 at 18:23

1 Answer 1

11

Thanks to Mark Tolen and Alastair McCormack for suggesting where the problem may be. The problem was really in the locale settings.
When I ran locale, the output was:

LANG=C
LANGUAGE=
LC_CTYPE="C"
LC_NUMERIC=cs_CZ.UTF-8
LC_TIME=cs_CZ.UTF-8
LC_COLLATE=cs_CZ.UTF-8
LC_MONETARY=cs_CZ.UTF-8
LC_MESSAGES="C"
LC_PAPER="C"
LC_NAME="C"
LC_ADDRESS="C"
LC_TELEPHONE="C"
LC_MEASUREMENT=cs_CZ.UTF-8
LC_IDENTIFICATION="C"
LC_ALL=

This "C" is the default setting which uses the ANSI charmap. And that is where the problem was. Running locale charmap gave me: ANSI_X3.4-1968 which can not display non-English characters.
I fixed this using this Ubuntu documentation site.

I added these lines to /etc/default/locale:

LANGUAGE=cs_CZ.UTF-8
LC_ALL=cs_CZ.UTF-8

Then you have to restart your session (log out and in) to apply these settings.

Running locale now returns this output:

LANG=C
LANGUAGE=cs
LC_CTYPE="cs_CZ.UTF-8"
LC_NUMERIC="cs_CZ.UTF-8"
LC_TIME="cs_CZ.UTF-8"
LC_COLLATE="cs_CZ.UTF-8"
LC_MONETARY="cs_CZ.UTF-8"
LC_MESSAGES="cs_CZ.UTF-8"
LC_PAPER="cs_CZ.UTF-8"
LC_NAME="cs_CZ.UTF-8"
LC_ADDRESS="cs_CZ.UTF-8"
LC_TELEPHONE="cs_CZ.UTF-8"
LC_MEASUREMENT="cs_CZ.UTF-8"
LC_IDENTIFICATION="cs_CZ.UTF-8"
LC_ALL=cs_CZ.UTF-8

and running locale charmap returns:

UTF-8
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.