0

In python I use an html template to display a steam player's information.

The template is:

'''<td>
<div>
Name: %s<br>
Hours: %s<br>
<a href="http://steamcommunity.com/profiles/%s" target="_blank">Steam Profile</a> <br>
</div>
</td>'''

So I have TEMPLATE %(personaName, tf2Hours, id64)

Later on that template is saved into an html file.

Occasionally it returns a UnicodeDecodeError, because personaName can contain strange characters.

Is there a way to avoid this while still having the correct characters in the final html file?

EDIT:

The reason for the error was non-unicode characters.

Doing unicode(personaName, errors='ignore') solved the issue.

1

1 Answer 1

2

Try:

 u'UnicodeTextHereaあä'.encode('ascii', 'ignore')

This will ignore unicode characters that can't be converted to ascii.

Here are a few examples that I just tried.

>>> x = 'Hello world!'
>>> y = 'notあä ascii'
>>> x.encode('ascii', 'ignore')
b'Hello world!'
>>> y.encode('ascii', 'ignore')
b'not ascii'

As you can see, it removed every trace of non-ascii characters.


Alternatively, you could tell the interpreter that you are planning on reading unicode values. For example (from docs.python.org/3.3/howto/unicode.html),

with open('unicode.txt', encoding='utf-8') as f:
    for line in f:
        print(repr(line))

This will interpret and allow you to read unicode as-is.

Sign up to request clarification or add additional context in comments.

2 Comments

'Dawn ♈ Of ♈ The ♈ Dusk'.encode('ascii', 'ignore') Returns an error. I have absolutely no idea why and it is really frustrating D:
How about this? line.decode('utf-8', 'ignore').encode('ascii', 'ignore')

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.