Pandas dataframe giving decoding error when attempting to use read_csv

Question

I'm trying to make a pandas DF from a csv file, but I'm getting a decoding error upon attempting to run the script.

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa0 in position 7: invalid start byte

During handling of the above exception, another exception occurred:

UnicodeDecodeError                        Traceback (most recent call last)
<ipython-input-6-b08d34b86a52> in <module>
----> 1 raw_data = pd.read_csv("D:\\Data\\anon\\anon_Logs\\anon_logs.csv", sep=',', low_memory=False)
      2 display(raw_data)

I understand that this can happen when there is an invalid character present, but is this talking about the character being present in the csv itself or am I writing something wrong? The only line I'm trying to execute there is the one written above, a simple read_csv.

How could I go around this? It's the first time I have issues making a DF from a csv.

Serge Ballesta · Accepted Answer · 2019-12-30 16:39:59Z

1

0xa0 is the unicode code point for a NO-BREAK SPACE character. It is a hint that the file encoding could be a latin variant instead of UTF-8. If unsure of the actual encoding, 'Latin1' can be used in any case because any possible byte value can be represented in Latin1 (and is the character having that code point). Simply some bytes could not be correctly represented if the encoding is different.

My advice is to use:

raw_data = pd.read_csv("D:\\Data\\anon\\anon_Logs\\anon_logs.csv", sep=',', low_memory=False, encoding='Latin1')

and then look into the dataframe for possible conversion problems.

answered Dec 30, 2019 at 16:39

Serge Ballesta

150k13 gold badges137 silver badges267 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Pandas dataframe giving decoding error when attempting to use read_csv

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related