1

When I get an exception as cPickle.UnpicklingError: invalid load key, 'ÿ'. and I try to print it, it raises a unicode decode error when I try to insert it into my (unicode) error message:

try:
    settings = _load()
except cPickle.UnpicklingError, err:
    msg = _(u"Error reading ... (the error is: '%s')")
    cont = askYes(msg % err, _(u"Settings Load Error")) # raises

Tried workarounds as in msg % unicode(err.message, encoding='utf-8') but apparently err.message is not valid unicode string ("UnicodeDecodeError: 'utf8' codec can't decode byte 0xff in position 19: invalid start byte")

So what is the most pythonic way to handle this ? Should I pass 'ignore' or 'replace' to unicode() ?

Edit: askYes(None, msg % repr(err), _(u"Settings Load Error")) gives something like:

(the error is: 'UnpicklingError("invalid load key, '\xff'.",)'). # ff is ÿ

Does not blow but still...

Edit2: the errors I reported are a bit mixed up with artificial ones:

u'%s' % "cPickle.UnpicklingError: invalid load key, 'ÿ'."
Traceback (most recent call last):
  File "<input>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 44: ordinal \
    not in range(128)

That's from the interpreter inside pycharm - apparently ÿ is '\xc3\xbf' there (...)

5
  • 2
    Using repr, or more directly %r rather than %s, is the best way to display a string of dubious content -- it may or may not be intended to represent Unicode, but either the \x0f you show or the 0xff you mention earlier make one ponder about the encoding. If err.message is a random collection of bytes with no rhyme or reason, how could you possibly display it better than by repr?! ignore or replace would hide potentially precious information for debugging purposes -- never do that in an error message! Commented Dec 26, 2014 at 4:57
  • @AlexMartelli: thanks - yes I would not use replace and co - I just wanted to fright people so they answer me :D. Could you elaborate on repr - would it be better to use repr(err.message)? I'd appreciate a full answer Commented Dec 26, 2014 at 11:23
  • @AlexMartelli: repr(err.message): (the error is: '"invalid load key, '\x0f'."') while repr(err): (the error is: 'UnpicklingError("invalid load key, '\x0f'.",)'). I'd rather have something in the lines of ` (the error is: UnpicklingError: "invalid load key, '\x0f'.")` - do I have to construct it manually ? Also I admit that why repr() manages to decode the string escapes me. Commented Dec 26, 2014 at 13:14
  • There doesn't appear to be any Unicode problem in the string as shown: '\x0f' gives no such problem -- while '\xff' would. Try decoding err.message as 'iso-8859-1', which cannot fail (it decodes every byte, though perhaps to a nonsense glyph), and you may learn more. BTW, no surprise that repr has no problem -- repr never fails -- it's the alchemic transmutation between '\xff' and '\x0f' that leaves me puzzled! Commented Dec 26, 2014 at 15:54
  • @AlexMartelli: oh sorry about that - I may have transmutated the error messages - apparently err.message was invalid load key, ' + chr(0xff) in the "UnicodeDecodeError: 'utf8' codec...position 19". How come repr never fails ? Does it use iso-8859-1 ? Commented Dec 26, 2014 at 16:28

2 Answers 2

3

One way to ensure you can see the result in the error message is to use repr, or more directly %r rather than %s: that never fails (because any object has a representation, and all representations are in ASCII including possibly escape sequences), and also shows (as escape sequences) characters that might otherwise be invisible.

repr (and '%r' in old-style format strings) delegates to an object type's __repr__ special method; each object type is responsible for knowing how to best represent itself in an unambiguous (not necessarily super-readable) ASCII character string. Strings and byte sequences are particularly good at that, so repr is super-suitable for them.

The OP has done that but does not like the aesthetics of the result (varying between repr of err.message vs repr of err). Unfortunately, aesthetics is the very least of priorities for repr: rather, it's all about complete, unambiguous information.

Another idea is to decode with a never-fail encoding (one which decodes every byte, though perhaps into a meaningless-in-context glyph), such as 'iso-8859-1'. But it's no real improvement over repr, I believe; the improvement in aesthetics is quite debatable, and there is a possibility of loss in terms of "complete, unambiguous information".

Sign up to request clarification or add additional context in comments.

Comments

0

Just to clarify some points:

Python 2.7.8 (default, Jun 30 2014, 16:03:49) [MSC v.1500 32 bit (Intel)] on win32
>>> u'%s' % "cPickle.UnpicklingError: invalid load key, 'ÿ'."
Traceback (most recent call last):
  File "<input>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 44: ordinal not in range(128)

That is because python 2 helpfully tries to decode the string to insert it to the unicode string - the default encoding being ASCII - of course ascii can't decode 'ÿ' (decode means transform bytes to code points) - hence the exception. Below works as it does not try to decode anything - just displays the bytes - in ascii:

>>> '%s' % "cPickle.UnpicklingError: invalid load key, 'ÿ'."
"cPickle.UnpicklingError: invalid load key, '\xc3\xbf'."

Below also works and (as console displays in ascii) displays the unicode byte value (that is encodes the unicode string - the bytes in there - to ascii chars):

>>> u'%s' % u"cPickle.UnpicklingError: invalid load key, 'ÿ'."
u"cPickle.UnpicklingError: invalid load key, '\xff'."

Same logic as in:

>>> u'á, é, í, ó, ú, ü, ñ'
u'\xe1, \xe9, \xed, \xf3, \xfa, \xfc, \xf1'
>>> 'á, é, í, ó, ú, ü, ñ'
'\xc3\xa1, \xc3\xa9, \xc3\xad, \xc3\xb3, \xc3\xba, \xc3\xbc, \xc3\xb1'

It is this internal encoding/decoding that confused me - and still puzzles me a bit.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.