18

I'm trying to import a CSV, using this code:

    import csv
    import sys

    def load_csv(filename):
        # Open file for reading
        file = open(filename, 'r')

        # Read in file
        return csv.reader(file, delimiter=',', quotechar='\n')

    def main(argv):
        csv_file = load_csv("myfile.csv")

        for item in csv_file:
            print(item)

    if __name__ == "__main__":
        main(sys.argv[1:])

Here's a sample of my csv file:

    foo,bar,test,1,2
    this,wont,work,because,α

And the error:

    Traceback (most recent call last):
      File "test.py", line 22, in <module>
        main(sys.argv[1:])
      File "test.py", line 18, in main
        for item in csv_file:
      File "/usr/lib/python3.2/encodings/ascii.py", line 26, in decode
        return codecs.ascii_decode(input, self.errors)[0]
    UnicodeDecodeError: 'ascii' codec can't decode byte 0xce in position 40: ordinal not in range(128)

Obviously, It's hitting the character at the end of the CSV and throwing that error, but I'm at a loss as to how to fix this. Any help?

This is:

    Python 3.2.3 (default, Apr 23 2012, 23:35:30)
    [GCC 4.7.0 20120414 (prerelease)] on linux2

3 Answers 3

22

It seems your problem boils down to:

print("α")

You could fix it by specifying PYTHONIOENCODING:

$ PYTHONIOENCODING=utf-8 python3 test.py > output.txt

Note:

$ python3 test.py 

should work as is if your terminal configuration supports it, where test.py:

import csv

with open('myfile.csv', newline='', encoding='utf-8') as file:
    for row in csv.reader(file):
        print(row)

If open() has no encoding parameter above then you'll get UnicodeDecodeError with LC_ALL=C.

Also with LC_ALL=C you'll get UnicodeEncodeError even if there is no redirection i.e., PYTHONIOENCODING is necessary in this case (before PEP 538: Legacy C Locale Coercion implemented in Python 3.7+).

Sign up to request clarification or add additional context in comments.

Comments

13

From the python docs, you have to set the encoding for the file. Here is an example from the site:

import csv

 with open('some.csv', newline='', encoding='utf-8') as f:
   reader = csv.reader(f)
   for row in reader:
     print(row)

Edit: Your problem appears to happen with printing. Try using pretty printer:

import csv
import pprint

with open('some.csv', newline='', encoding='utf-8') as f:
  reader = csv.reader(f)
  for row in reader:
    pprint.pprint(row)

5 Comments

Setting the encoding for the file does nothing to fix the issue... file = open(filename, 'r', encoding='utf-8') still gives me UnicodeDecodeError: 'ascii' codec can't decode byte 0xce in position 40: ordinal not in range(128)
Ah, it has to do with print not being able to display unicode characters. This question on Quora may have the answer -- it uses pretty printer: quora.com/How-do-you-print-a-python-unicode-data-structure
I think the error has nothing to do with the print at all. It's hitting the error at the beginning of the for loop, before the print() even runs. Your edited sample code using pprint yields the same error as before, further reinforcing this claim. I'm stumped.
export PYTHONIOENCODING=utf-8 fixed my issue.
@betaRepeating "export PYTHONIOENCODING=utf-8 fixed my issue." could you explain further?
6

Another option is to cover up the errors by passing an error handler:

with open('some.csv', newline='', errors='replace') as f:
   reader = csv.reader(f)
   for row in reader:
    print(row)

which will replace any undecodable bytes in the file with a "missing character".

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.