Getting UnicodeDecodeError while accessing csv file

Question

Input file : chars.csv :

4,,x,,2,,9.012,2,,,,
6,,y,,2,,12.01,Â±4,,,,
7,,z,,2,,14.01,_3,,,,

When I try to parse this file, I get this error even after specifying utf-8 encoding.

>>> f=open('chars.csv',encoding='utf-8')
>>> f.read()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3.2/codecs.py", line 300, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb1 in position 36: invalid start byte

How to correct this error?

Version: Python 3.2.3

RobertT · Accepted Answer · 2013-04-12 07:34:21Z

3

Your input file is clearly not utf-8 encoded, so you have at least those options:

f=open('chars.csv', encoding='utf-8', errors='ignore') if given file is mostly utf-8 and you don't care about some small data loss. For other errors parameter values check manual
simply use proper encoding, like latin-1, if you know one

answered Apr 12, 2013 at 7:34

RobertT

4,6123 gold badges34 silver badges39 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Apprentice Queue · Accepted Answer · 2013-04-12 07:56:56Z

0

This is not UTF-8 encoding. The UTF-8 encoding of ± is \xC2\xB1 and Â is \xC2\x83. As RobertT suggested, try Latin-1:

f=open('chars.csv',encoding='latin-1')

answered Apr 12, 2013 at 7:56

Apprentice Queue

2,05614 silver badges13 bronze badges

Collectives™ on Stack Overflow

Getting UnicodeDecodeError while accessing csv file

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related