Decode a Python string

Question

Sorry for the generic title.

I am receiving a string from an external source: txt = external_func()

I am copying/pasting the output of various commands to make sure you see what I'm talking about:

In [163]: txt
Out[163]: '\\xc3\\xa0 voir\\n'

In [164]: print(txt)
\xc3\xa0 voir\n

In [165]: repr(txt)
Out[165]: "'\\\\xc3\\\\xa0 voir\\\\n'"

I am trying to transform that text to UTF-8 (?) to have txt = "à voir\n", and I can't see how.

How can I do transformations on this variable?

kalehmann · Accepted Answer · 2018-12-17 14:20:20Z

3

You can encode your txt to a bytes-like object using the encode-method of the str class. Then this byte-like object can be decoded again with the encoding unicode_escape.

Now you have your string with all escape sequences parsed, but latin-1 decoded. You still have to encode it with latin-1 and then decode it again with utf-8.

>>> txt = '\\xc3\\xa0 voir\\n'
>>> txt.encode('utf-8').decode('unicode_escape').encode('latin-1').decode('utf-8')
'à voir\n'

The codecs module also has an undocumented funciton called escape_decode:

>>> import codecs
>>> codecs.escape_decode(bytes('\\xc3\\xa0 voir\\n', 'utf-8'))[0].decode('utf-8')
'à voir\n'

answered Dec 17, 2018 at 14:20

kalehmann

5,0896 gold badges29 silver badges37 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Be Chiller Too Over a year ago

Wow, a bit messy/convoluted, but thank you very much!

kalehmann Over a year ago

I am glad I could help you. You are right about the messy part, but I could not find any function, that does all in one step.

Collectives™ on Stack Overflow

Decode a Python string

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related