2

Sorry for the generic title.

I am receiving a string from an external source: txt = external_func()

I am copying/pasting the output of various commands to make sure you see what I'm talking about:

In [163]: txt
Out[163]: '\\xc3\\xa0 voir\\n'

In [164]: print(txt)
\xc3\xa0 voir\n

In [165]: repr(txt)
Out[165]: "'\\\\xc3\\\\xa0 voir\\\\n'"

I am trying to transform that text to UTF-8 (?) to have txt = "à voir\n", and I can't see how.

How can I do transformations on this variable?

1 Answer 1

3

You can encode your txt to a bytes-like object using the encode-method of the str class. Then this byte-like object can be decoded again with the encoding unicode_escape.

Now you have your string with all escape sequences parsed, but latin-1 decoded. You still have to encode it with latin-1 and then decode it again with utf-8.

>>> txt = '\\xc3\\xa0 voir\\n'
>>> txt.encode('utf-8').decode('unicode_escape').encode('latin-1').decode('utf-8')
'à voir\n'

The codecs module also has an undocumented funciton called escape_decode:

>>> import codecs
>>> codecs.escape_decode(bytes('\\xc3\\xa0 voir\\n', 'utf-8'))[0].decode('utf-8')
'à voir\n'
Sign up to request clarification or add additional context in comments.

2 Comments

Wow, a bit messy/convoluted, but thank you very much!
I am glad I could help you. You are right about the messy part, but I could not find any function, that does all in one step.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.