5

I have this string:

V posledn\u00edch m\u011bs\u00edc\u00edch se bezpe\u010dnostn\u00ed situace v Libyi zna\u010dn\u011b zhor\u0161ila, o \u010dem\u017e sv\u011bd\u010d\u00ed i ned\u00e1vn\u00e9 n\u00e1hl\u00e9 opu\u0161t\u011bn\u00ed zem\u011b nejen \u010desk\u00fdmi diplomaty. Libyi hroz\u00ed nekontrolovan\u00fd rozpad a nekone\u010d

Which should read "V posledních měsících se ..." so \u00ed is í and \u011b is ě.

Any idea how to decode this in Python? It is a javascript code I am parsing in python. I could write my own ad-hoc solution as there are not that many characters that are escaped (there are only twelve or so accented characters in Czech), but that seems ugly.

3 Answers 3

11

Decode it using the 'unicode-escape' codec. If x is your string, x.decode('unicode-escape').

Sign up to request clarification or add additional context in comments.

3 Comments

'\u2019'.decode('unicode-escape') gives me u'\u2019' (Python 2.7.17)
My bad, r'\u2019'.decode('unicode-escape') gives u'\u2019', which printed gives as expected
.encode().decode('unicode-escape') If you are dealing with a string in python that is already encoded like this.
1

If it is Javascript code, then perhaps it's actually JSON, and you can use json.loads to decode it.

1 Comment

That does not seem to work straigth away (it says it is no json) and the answer by BrenBarn actually works great, thanks though!
0

I had a similar issue, was solved by:

unicodedata.normalize('NFD', my_string.decode('unicode-escape')).encode('ascii','ignore')

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.