0

I'm getting an encoding error from a script, as follows:

from django.template import loader, Context
t = loader.get_template(filename)
c = Context({'menus': menus})
print t.render(c)
  File "../django_to_html.py", line 45, in <module>
    print t.render(c)
    UnicodeEncodeError: 'ascii' codec can't encode character u'\u2019' in position 34935: ordinal not in range(128)

I don't own the script, so I don't have the ability to edit it. The only thing I can do is change the filename supplied so it doesn't contain the Unicode character to which the script is objecting.

This file is a text file that I'm editing in TextMate. What can I do to identify and get rid of the character that the script is barfing on?

Could I use something like iconv, and if so how?

Thanks!

4 Answers 4

3

How to find ALL the nasties in your file:

import unicodedata as ucd
import sys
with open(sys.argv[1]) as f:
    for linex, line in enumerate(f):
        uline = line.decode('UTF-8')
        bad_line = False
        for charx, char in enumerate(uline):
            if char <= u'\xff': continue
            print "line %d, column %d: %s" % (
                linex+1, charx+1, ucd.name(char, '<unknown>'))
            bad_line = True
        if bad_line:
            print repr(uline)
            print

Sample output:

line 1, column 6: RIGHT SINGLE QUOTATION MARK
line 1, column 10: SINGLE LOW-9 QUOTATION MARK
u'yadda\u2019foo\u201abar\r\n'

line 2, column 4: IDEOGRAPHIC SPACE
u'fat\u3000space\r\n'
Sign up to request clarification or add additional context in comments.

Comments

2

I don't know why you're using Django's template engine to create console output, but the Python wiki shows a way to work around this on Windows using a Python-specific environment variable:

set PYTHONIOENCODING=utf_8

This will set stdout/stderr encoding to UTF-8, meaning you can print all Unicode characters. As the command line encoding in Windows is usually not UTF-8, you'll see a UTF-like sequence printed instead of special characters. For example:

>>> print u'\u2019'
ΓÇÖ

2 Comments

I'm not on Windows unfortunately, I'm on OSX.
@AP257: I don't think that makes a difference. Your problem stays the same - and setting env variables should be possible in Mac OSX, too?!
1

The character is in position 34935 in the file. The helpful traceback tells you that.

1 Comment

Actually it's the position in the rendered output, not in the template file. But that should help, too.
0

\u2019 is a right single quotation mark (http://www.unicode.org/charts/ has a helpful search box where you can enter the code), maybe that'll help track it down. If your file ends up in HTML again, you could maybe use the ’ notation for these characters. (As John points out, this accepts hex notation.)

2 Comments

No need to convert; use &#x2019
@John: Cheeers, hadn't come across that one!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.