2

I'm having an issue with Python2.7 complaining I do not have encoding declared; however, it is infact declared. I'm running this on OS X El Capitan (10.11.3) and python 2.7.11.

I'm attempting to search a data set for specific Chinese and english terms. The report.csv contains the data which I want to search and the raw_terms.txt contains the Chinese and English terms in new line separated. Both files were saved as UTF-8.

I've noticed this code works on different machines, but not mine. I'm assuming there is something I have changed in the year+ I've had this laptop which is causing this issue, but I'm unsure where to start my search.

Script:

#!/usr/bin/python
# -*- coding: utf-8 -*-

import csv

count = 0
with open('./data/report.csv', 'rb') as c:
    csv_data = csv.DictReader(c, delimiter=',', quoting=csv.QUOTE_ALL)
    for data in csv_data:
        with open('./terms/raw_terms.txt', 'r') as f:
            for term in f:
                term = term.strip()
                if term in data['Description']: #or term in '你好!你好吗':
                    # print 'Found \"%s\" in \"%s\"' % (term, data['Subject'])
                    count += 1
                else:
                    continue

print count

Error:

File "t.py", line 1
SyntaxError: Non-ASCII character '\xfe' in file t.py on line 1, but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details

Appreciate any help/direction anyone can provide.

6
  • You have not declared an encoding. The coding: comment applies to your code, not to the file you are opening from the code. Anyway, the CSV module has well-documented trouble with Unicode -- look for the many, many duplicates here. Commented Mar 16, 2016 at 15:21
  • I have tried numerous other techniques including using the codecs (codecs.open(file_location, 'rb', 'UTF-8') as f:) module and .encode('unicode-escape'). Also, it's the non-csv file which I'm getting the error for. Commented Mar 16, 2016 at 15:24
  • The error message suggests that your source file has a Unicode BOM, and that it is in fact not in UTF-8. If it were, the first character would be \xef, not \xfe. Probably your file is in UTF-16. Try to save it as UTF-8 without a BOM. Commented Mar 16, 2016 at 15:24
  • Thanks @triplee, that's weird. I used Sublime 2 to save the file "with Encoding" > UTF-8. Not including the with BOM option. Any suggestions on how I could save this file properly? Commented Mar 16, 2016 at 15:27
  • I have no immediate solution, but a hex dump of the first few bytes of the file may help reveal what exactly you have. Commented Mar 16, 2016 at 15:34

1 Answer 1

0

Your exception is due to your source code having non-ASCII characters in it. In your case, it appears that your file has been saved as UTF-16 BE with BOM.

Unfortunately, the encoding / coding header has to come before any non-ascii, which is of course not possible as the BOM has to reside a byte 0. A catch 22 situation.

Your only choice is to change the encoding of your file to an encoding that doesn't need a BOM, such as UTF-8. In Sublime, you can simple choose: File -> Save with Encoding -> UTF-8.

On the command line, you re-encode and strip the BOM:

iconv -f UTF-16BE -t UTF-8 test42.py | tail -c +4 > test43.py

Also, heed @tripleee's comment about the CSV module in Python 2.x. Instead, use https://github.com/jdunck/python-unicodecsv, which is a Unicode compatible drop-in replacement

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.