UnicodeDecodeError in Python 3 when importing a CSV file

Question

I'm trying to import a CSV, using this code:

    import csv
    import sys

    def load_csv(filename):
        # Open file for reading
        file = open(filename, 'r')

        # Read in file
        return csv.reader(file, delimiter=',', quotechar='\n')

    def main(argv):
        csv_file = load_csv("myfile.csv")

        for item in csv_file:
            print(item)

    if __name__ == "__main__":
        main(sys.argv[1:])

Here's a sample of my csv file:

    foo,bar,test,1,2
    this,wont,work,because,α

And the error:

    Traceback (most recent call last):
      File "test.py", line 22, in <module>
        main(sys.argv[1:])
      File "test.py", line 18, in main
        for item in csv_file:
      File "/usr/lib/python3.2/encodings/ascii.py", line 26, in decode
        return codecs.ascii_decode(input, self.errors)[0]
    UnicodeDecodeError: 'ascii' codec can't decode byte 0xce in position 40: ordinal not in range(128)

Obviously, It's hitting the character at the end of the CSV and throwing that error, but I'm at a loss as to how to fix this. Any help?

This is:

    Python 3.2.3 (default, Apr 23 2012, 23:35:30)
    [GCC 4.7.0 20120414 (prerelease)] on linux2

jfs · Accepted Answer · 2021-08-14 14:47:20Z

22

It seems your problem boils down to:

print("α")

You could fix it by specifying PYTHONIOENCODING:

$ PYTHONIOENCODING=utf-8 python3 test.py > output.txt

Note:

$ python3 test.py

should work as is if your terminal configuration supports it, where test.py:

import csv

with open('myfile.csv', newline='', encoding='utf-8') as file:
    for row in csv.reader(file):
        print(row)

If open() has no encoding parameter above then you'll get UnicodeDecodeError with LC_ALL=C.

Also with LC_ALL=C you'll get UnicodeEncodeError even if there is no redirection i.e., PYTHONIOENCODING is necessary in this case (before PEP 538: Legacy C Locale Coercion implemented in Python 3.7+).

edited Aug 14, 2021 at 14:47

answered Oct 5, 2012 at 19:37

jfs

417k210 gold badges1k silver badges1.7k bronze badges

Sign up to request clarification or add additional context in comments.

Comments

TheDude · Accepted Answer · 2012-10-05 19:24:11Z

13

From the python docs, you have to set the encoding for the file. Here is an example from the site:

import csv

 with open('some.csv', newline='', encoding='utf-8') as f:
   reader = csv.reader(f)
   for row in reader:
     print(row)

Edit: Your problem appears to happen with printing. Try using pretty printer:

import csv
import pprint

with open('some.csv', newline='', encoding='utf-8') as f:
  reader = csv.reader(f)
  for row in reader:
    pprint.pprint(row)

edited Oct 5, 2012 at 19:24

answered Oct 5, 2012 at 18:55

TheDude

3,9722 gold badges32 silver badges51 bronze badges

5 Comments

Ryan Rapini Over a year ago

Setting the encoding for the file does nothing to fix the issue... file = open(filename, 'r', encoding='utf-8') still gives me UnicodeDecodeError: 'ascii' codec can't decode byte 0xce in position 40: ordinal not in range(128)

TheDude Over a year ago

Ah, it has to do with print not being able to display unicode characters. This question on Quora may have the answer -- it uses pretty printer: quora.com/How-do-you-print-a-python-unicode-data-structure

Ryan Rapini Over a year ago

I think the error has nothing to do with the print at all. It's hitting the error at the beginning of the for loop, before the print() even runs. Your edited sample code using pprint yields the same error as before, further reinforcing this claim. I'm stumped.

Ryan Rapini Over a year ago

export PYTHONIOENCODING=utf-8 fixed my issue.

Inês Martins Over a year ago

@betaRepeating "export PYTHONIOENCODING=utf-8 fixed my issue." could you explain further?

Ayush Abhijeet · Accepted Answer · 2019-12-30 20:40:19Z

6

Another option is to cover up the errors by passing an error handler:

with open('some.csv', newline='', errors='replace') as f:
   reader = csv.reader(f)
   for row in reader:
    print(row)

which will replace any undecodable bytes in the file with a "missing character".

answered Dec 30, 2019 at 20:40

Ayush Abhijeet

611 silver badge1 bronze badge

Collectives™ on Stack Overflow

UnicodeDecodeError in Python 3 when importing a CSV file

3 Answers 3

Comments

5 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

5 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related