0

I am trying to open, print, and read a text file that contains special characters such as §. Below is the code I am running:

    import codecs
    f = codecs.open('sample_text.txt', mode='r', encoding='utf_8')
    print f.readline()

The first two lines work, but the third does not. The error code says: Traceback (most recent call last):

"C:\Users\mallikk\Documents\Python Scripts\special_char_test.py", line 6, in <module>
    print f.readline()
  File "C:\Anaconda2\lib\codecs.py", line 690, in readline
    return self.reader.readline(size)
  File "C:\Anaconda2\lib\codecs.py", line 545, in readline
    data = self.read(readsize, firstline=True)
  File "C:\Anaconda2\lib\codecs.py", line 492, in read
    newchars, decodedbytes = self.decode(data, self.errors)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xa7 in position 13: invalid start byte

Any ideas? Please let me know if I can clarify anything or add more details. Thank you so much!

6
  • 4
    This file is not encoded in UTF-8. Find the actual encoding and use that. Commented Jun 23, 2016 at 16:42
  • I don't think that 0xa7 is valid utf8. Are you sure it's in utf-8? Also why are you using codecs and not open? Commented Jun 23, 2016 at 16:47
  • stackoverflow.com/questions/4255305/… Commented Jun 23, 2016 at 16:53
  • 1
    @user2357112 It was not in utf-8. I changed it in Notepad++. Thanks for the help! Commented Jun 23, 2016 at 16:57
  • 1
    @Shivani This question discusses codecs.open vs builtin open and io.open. Looks like you are right in python2 while in python3 open is preferred. Commented Jun 23, 2016 at 17:07

2 Answers 2

1

To expand on what the commenters said, you need to find out the encoding of your file. The easiest way I know to do that is to:

  1. Open the file in Firefox.
  2. Right-click on the page and select "View Page Info"
  3. See what the "Text Encoding" is.
  4. Then you can check the codecs documentation for the codec to use instead of utf_8 in your f = codecs.open(...) line.

Screenshot of steps 1–3:

screenshot

Sign up to request clarification or add additional context in comments.

Comments

0

It looks like you are on a windows machine where encoding for the text file might be different from UTF-8, you might want to try cp1252/ISO-8859-1 use for decoding the bytestring and then encode it again using utf-8.

You can also take a look here for an advice on a best-practice how to read files - Difference between open and codecs.open in Python

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.