Reading csv-file in Python containing undefined characters

Question

I am reading a CSV-file (ANSI) on my Windows-machine in Python using this code:

import csv
with open('ttest.dat') as csvDataFile:
    csvReader = csv.reader(csvDataFile, delimiter="\t")
    for i in csvReader:
        print(i)

However, I get the error: UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 4: character maps to <undefined>

Upon inspecting the file in Notepad++ (after converting it to UTF-8 encoding in Notepad) I see that the following appears:

It seems that these characters adjacent to hello are causing the issue. When I remove them manually the file can be read.

Is there a way to load the file in Python while explicitly telling it to disregard these odd characters? Or, alternatively, is there a method to strip the text from these characters automatically? My file is rather large, so it isn't realistic that I manually look through each line.

Note: In R I can read the file without any issues using read.csv

ltd9938 · Accepted Answer · 2018-01-30 13:46:07Z

2

with open('ttest.dat', encoding="utf8") as csvDataFile:

This will open the file with UTF-8 encoding.

answered Jan 30, 2018 at 13:46

ltd9938

1,4541 gold badge17 silver badges29 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

N08 Over a year ago

This circumvents the issue with the odd characters, but now I can't parse text that contains letters æ, ø and å

ltd9938 Over a year ago

Is that Norwegian letters? Try iso-8859-1. Otherwise look up some some tutorials on the subject. You will definitely find something that will help.

Collectives™ on Stack Overflow

Reading csv-file in Python containing undefined characters

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related