Avoiding UnicodeDecodeError exceptions Python

Question

In python I use an html template to display a steam player's information.

The template is:

'''<td>
<div>
Name: %s<br>
Hours: %s<br>
<a href="http://steamcommunity.com/profiles/%s" target="_blank">Steam Profile</a> <br>
</div>
</td>'''

So I have TEMPLATE %(personaName, tf2Hours, id64)

Later on that template is saved into an html file.

Occasionally it returns a UnicodeDecodeError, because personaName can contain strange characters.

Is there a way to avoid this while still having the correct characters in the final html file?

EDIT:

The reason for the error was non-unicode characters.

Doing unicode(personaName, errors='ignore') solved the issue.

farmdev.com/talks/unicode

Ignacio Vazquez-Abrams
– Ignacio Vazquez-Abrams

2013-12-01 02:49:49 +00:00
Commented Dec 1, 2013 at 2:49 — Ignacio Vazquez-Abrams
– Ignacio Vazquez-Abrams, Commented Dec 1, 2013 at 2:49

Matt · Accepted Answer · 2013-12-01 04:30:44Z

2

Try:

 u'UnicodeTextHereaあä'.encode('ascii', 'ignore')

This will ignore unicode characters that can't be converted to ascii.

Here are a few examples that I just tried.

>>> x = 'Hello world!'
>>> y = 'notあä ascii'
>>> x.encode('ascii', 'ignore')
b'Hello world!'
>>> y.encode('ascii', 'ignore')
b'not ascii'

As you can see, it removed every trace of non-ascii characters.

Alternatively, you could tell the interpreter that you are planning on reading unicode values. For example (from docs.python.org/3.3/howto/unicode.html),

with open('unicode.txt', encoding='utf-8') as f:
    for line in f:
        print(repr(line))

This will interpret and allow you to read unicode as-is.

edited Dec 1, 2013 at 4:30

answered Dec 1, 2013 at 2:54

Matt

4373 silver badges10 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

user2430629 Over a year ago

'Dawn â™ˆ Of â™ˆ The â™ˆ Dusk'.encode('ascii', 'ignore') Returns an error. I have absolutely no idea why and it is really frustrating D:

JimB Over a year ago

How about this? line.decode('utf-8', 'ignore').encode('ascii', 'ignore')

Collectives™ on Stack Overflow

Avoiding UnicodeDecodeError exceptions Python

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related