1

Input file : chars.csv :

4,,x,,2,,9.012,2,,,,
6,,y,,2,,12.01,±4,,,,
7,,z,,2,,14.01,_3,,,,

When I try to parse this file, I get this error even after specifying utf-8 encoding.

>>> f=open('chars.csv',encoding='utf-8')
>>> f.read()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3.2/codecs.py", line 300, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb1 in position 36: invalid start byte

How to correct this error?

Version: Python 3.2.3

2 Answers 2

3

Your input file is clearly not utf-8 encoded, so you have at least those options:

  • f=open('chars.csv', encoding='utf-8', errors='ignore') if given file is mostly utf-8 and you don't care about some small data loss. For other errors parameter values check manual
  • simply use proper encoding, like latin-1, if you know one
Sign up to request clarification or add additional context in comments.

Comments

0

This is not UTF-8 encoding. The UTF-8 encoding of ± is \xC2\xB1 and  is \xC2\x83. As RobertT suggested, try Latin-1:

f=open('chars.csv',encoding='latin-1')

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.