1

I am reading a CSV-file (ANSI) on my Windows-machine in Python using this code:

import csv
with open('ttest.dat') as csvDataFile:
    csvReader = csv.reader(csvDataFile, delimiter="\t")
    for i in csvReader:
        print(i)

However, I get the error: UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 4: character maps to <undefined>

Upon inspecting the file in Notepad++ (after converting it to UTF-8 encoding in Notepad) I see that the following appears:

enter image description here

It seems that these characters adjacent to hello are causing the issue. When I remove them manually the file can be read.

Is there a way to load the file in Python while explicitly telling it to disregard these odd characters? Or, alternatively, is there a method to strip the text from these characters automatically? My file is rather large, so it isn't realistic that I manually look through each line.

Note: In R I can read the file without any issues using read.csv

1 Answer 1

2
with open('ttest.dat', encoding="utf8") as csvDataFile:

This will open the file with UTF-8 encoding.

Sign up to request clarification or add additional context in comments.

2 Comments

This circumvents the issue with the odd characters, but now I can't parse text that contains letters æ, ø and å
Is that Norwegian letters? Try iso-8859-1. Otherwise look up some some tutorials on the subject. You will definitely find something that will help.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.