0

I'm trying to make a pandas DF from a csv file, but I'm getting a decoding error upon attempting to run the script.

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa0 in position 7: invalid start byte

During handling of the above exception, another exception occurred:

UnicodeDecodeError                        Traceback (most recent call last)
<ipython-input-6-b08d34b86a52> in <module>
----> 1 raw_data = pd.read_csv("D:\\Data\\anon\\anon_Logs\\anon_logs.csv", sep=',', low_memory=False)
      2 display(raw_data)

I understand that this can happen when there is an invalid character present, but is this talking about the character being present in the csv itself or am I writing something wrong? The only line I'm trying to execute there is the one written above, a simple read_csv.

How could I go around this? It's the first time I have issues making a DF from a csv.

1 Answer 1

1

0xa0 is the unicode code point for a NO-BREAK SPACE character. It is a hint that the file encoding could be a latin variant instead of UTF-8. If unsure of the actual encoding, 'Latin1' can be used in any case because any possible byte value can be represented in Latin1 (and is the character having that code point). Simply some bytes could not be correctly represented if the encoding is different.

My advice is to use:

raw_data = pd.read_csv("D:\\Data\\anon\\anon_Logs\\anon_logs.csv", sep=',', low_memory=False, encoding='Latin1')

and then look into the dataframe for possible conversion problems.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.