0

I have a large json file with utf-8 encoded characters. How can I read this file and convert these characters to more readable version? I have something like this:

{
    "name": "Wroc\u00c5\u0082aw"
}

and i want to have this:

{
    "name": "Wrocław"
}
0

1 Answer 1

2

If your JSON data contains mojibake like this, you can convert it to proper Unicode by converting the string to Latin-1, then decoding the result as UTF-8. This reverses whichever process produced the mojibake. (The fact that the strings come from JSON is inconsequential; this works for any mojibake strings of this type.)

>>> s = "Wroc\u00c5\u0082aw"
>>> s.encode('latin-1').decode('utf-8')
'Wrocław'

In the general case, you have to reverse-engineer what produced the mojibake, but this particular case is easy to identify and troubleshoot, because the Latin-1 encoding in particular is obvious and transparent (every byte is encoded exactly as itself).

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.