Python: Ascii characters from file display wrong

Question

Here's my code:

import sys, os

print("█████") #<-- Those are solid blocks.
f= open('file.txt')
for line in f:
    print(line)

In file.txt is this:

hay hay, guys
████████████

But the output is this:

██████
hay hay, guys <----- ***Looks like it outptutted this correctly!***

Traceback (most recent call last):
  File "echofile.py", line 6, in <module>
    print(line)
  File "C:\python33\lib\encodings\cp437.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_map)[0]
UnicodeEncodeError: 'charmap' codec can't encode characters in position 1-2: cha
racter maps to <undefined> <------ ***But not from the file!***

Anybody have any suggestions as to why it is doing this? I wrote the code in IDLE, tried editing the file.txt in both Programmer's Notepad and IDLE. The file is ASCII / ANSI. I'm using Python 3, by the way. 3.3 alpha win-64 if it matters.

steveha · Accepted Answer · 2012-04-19 00:47:31Z

2

This is clearly an issue with character encodings.

In Python 3.x, all strings are Unicode. But when reading or writing a file, it will be necessary to translate the Unicode to some specific encoding.

By default, a Python source file is handled as UTF-8. I don't know exactly what characters you pasted into your source file for the blocks, but whatever it is, Python reads it as UTF-8 and it seems to work. Maybe your text editor converted to valid UTF-8 when you inserted those?

The backtrace suggests that Python is treating the input file as "Code Page 437" or the original IBM PC 8-bit character set. Is that correct?

This link shows how to set a specific decoder to handle a particular file encoding on input:

http://lucumr.pocoo.org/2010/2/11/porting-to-python-3-a-guide/

EDIT: I found a better resource:

http://docs.python.org/release/3.0.1/howto/unicode.html

And based on that, here's some sample code:

with open('mytextfile.txt', encoding='utf-8') as f:
    for line in f:
        print(line, end='')

Originally I had the above set to "cp437" but in a comment you said "utf-8" was correct, so I made that change to this example. I'm specifying end='' here because the input lines from the file already have a newline on the end, so we don't need print() to supply another newline.

EDIT: I found a short discussion of default encodings here:

http://docs.python.org/release/3.0.1/whatsnew/3.0.html

The important bit: "There is a platform-dependent default encoding, which on Unixy platforms can be set with the LANG environment variable (and sometimes also with some other platform-specific locale-related environment variables). In many cases, but not all, the system default is UTF-8; you should never count on this default."

So, I had thought that Python defaulted to UTF-8, but not always, it seems. Actually, from your stack backtrace, I think on your system with your LANG environment setting you are getting "cp437" as your default.

So, I learned something too by answering your question!

P.S. I changed the code example above to specify utf-8 since that is what you needed.

edited Apr 19, 2012 at 0:47

answered Apr 18, 2012 at 21:06

steveha

77.1k21 gold badges94 silver badges119 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

SuperDisk Over a year ago

cp437 outputs this: █▓▓▓▓▓▓█ <--- Printed hay hay, guys asdfΓûôΓûôΓûô Γûê Γûê <-- Printed from file.

steveha Over a year ago

In that case, I am going to guess that cp437 is not correct. UTF-8 is the default, and that wasn't correct either. I don't know what to tell you; you need to figure it out.

SuperDisk Over a year ago

Actually, just specifying the encoding as 'utf-8' works but not specifying an encoding doesn't. Weird.

Blender · Accepted Answer · 2012-04-18 21:00:16Z

0

Try making that string unicode:

print(u"█████")
      ^ Add this

answered Apr 18, 2012 at 21:00

Blender

300k55 gold badges462 silver badges511 bronze badges

4 Comments

steveha Over a year ago

He is using Python 3.x, which no longer has the u"" syntax. All strings are Unicode in Python 3.x.

SuperDisk Over a year ago

That's not what is causing it. The print itself works fine, printing from the file is what causes the error. This is also python 3.

SuperDisk Over a year ago

Turns out the answer was Blender's comment. Open the file as read-binary ('rb') and then do print(line.decode()) It seems like a bit of a hackjob though.. is there any sort of 'Clean' way to do it?

steveha Over a year ago

@user1068392, the clean way to do it is just to specify an encoding in the call to open(). See my answer for sample code.

Collectives™ on Stack Overflow

Python: Ascii characters from file display wrong

2 Answers 2

3 Comments

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related