7

I have scripts which print out messages by the logging system or sometimes print commands. On the Windows console I get error messages like

Traceback (most recent call last):
  File "C:\Python32\lib\logging\__init__.py", line 939, in emit
    stream.write(msg)
  File "C:\Python32\lib\encodings\cp850.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_map)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u2019' in position 4537:character maps to <undefined>

Is there a general way to make all encodings in the logging system, print commands, etc. fail-safe (ignore errors)?

1 Answer 1

9

The problem is that your terminal/shell (cmd as your are on Windows) cannot print every Unicode character.

You can fail-safe encode your strings with the errors argument of the str.encode method. For example you can replace not supported chars with ? by setting errors='replace'.

>>> s = u'\u2019'
>>> print s
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Python27\lib\encodings\cp850.py", line 12, in encode
    return codecs.charmap_encode(input,errors,encoding_map)
UnicodeEncodeError: 'charmap' codec can\'t encode character u'\u2019' in position
 0: character maps to <undefined>
>>> print s.encode('cp850', errors='replace')
?

See the documentation for other options.

Edit If you want a general solution for the logging, you can subclass StreamHandler:

class CustomStreamHandler(logging.StreamHandler):

    def emit(self, record):
        record = record.encode('cp850', errors='replace')
        logging.StreamHandler.emit(self, record)
Sign up to request clarification or add additional context in comments.

5 Comments

But if I pre-encode all strings they change type (to bytes) which might change their behaviour in the interior? Also it's in the built-in codec library. I cannot change that. Can I set an option in codec?
Edited my answer with a general logging solution.
And is there a general solution so that I don't have to change code at different places (substitute handlers)? Maybe some global option for encoding errors?
If you don't use multiple loggers (by using getLogger) you have to set the handler once. If you use multiple handlers, you can use setLoggerClass with a custom class which is using the handler.
This answer seems to get the job done, quite effectively.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.