Python readline not working with codecs

Question

I am trying to open, print, and read a text file that contains special characters such as §. Below is the code I am running:

    import codecs
    f = codecs.open('sample_text.txt', mode='r', encoding='utf_8')
    print f.readline()

The first two lines work, but the third does not. The error code says: Traceback (most recent call last):

"C:\Users\mallikk\Documents\Python Scripts\special_char_test.py", line 6, in <module>
    print f.readline()
  File "C:\Anaconda2\lib\codecs.py", line 690, in readline
    return self.reader.readline(size)
  File "C:\Anaconda2\lib\codecs.py", line 545, in readline
    data = self.read(readsize, firstline=True)
  File "C:\Anaconda2\lib\codecs.py", line 492, in read
    newchars, decodedbytes = self.decode(data, self.errors)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xa7 in position 13: invalid start byte

Any ideas? Please let me know if I can clarify anything or add more details. Thank you so much!

This file is not encoded in UTF-8. Find the actual encoding and use that. — user2357112
– user2357112, Commented Jun 23, 2016 at 16:42
I don't think that 0xa7 is valid utf8. Are you sure it's in utf-8? Also why are you using codecs and not open? — syntonym
– syntonym, Commented Jun 23, 2016 at 16:47
@user2357112 It was not in utf-8. I changed it in Notepad++. Thanks for the help! — Shivani
– Shivani, Commented Jun 23, 2016 at 16:57
@Shivani This question discusses codecs.open vs builtin open and io.open. Looks like you are right in python2 while in python3 open is preferred. — syntonym
– syntonym, Commented Jun 23, 2016 at 17:07

cxw · Accepted Answer · 2016-06-23 16:55:57Z

1

To expand on what the commenters said, you need to find out the encoding of your file. The easiest way I know to do that is to:

Open the file in Firefox.
Right-click on the page and select "View Page Info"
See what the "Text Encoding" is.
Then you can check the codecs documentation for the codec to use instead of utf_8 in your f = codecs.open(...) line.

Screenshot of steps 1–3:

answered Jun 23, 2016 at 16:55

cxw

17.1k2 gold badges50 silver badges84 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Community · Accepted Answer · 2017-05-23 11:58:59Z

0

It looks like you are on a windows machine where encoding for the text file might be different from UTF-8, you might want to try cp1252/ISO-8859-1 use for decoding the bytestring and then encode it again using utf-8.

You can also take a look here for an advice on a best-practice how to read files - Difference between open and codecs.open in Python

edited May 23, 2017 at 11:58

CommunityBot

11 silver badge

answered Jun 23, 2016 at 17:17

Stanley Kirdey

6315 silver badges21 bronze badges

Collectives™ on Stack Overflow

Python readline not working with codecs

2 Answers 2

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related