Reading json files with utf-8 characters with python

Question

I have a large json file with utf-8 encoded characters. How can I read this file and convert these characters to more readable version? I have something like this:

{
    "name": "Wroc\u00c5\u0082aw"
}

and i want to have this:

{
    "name": "Wrocław"
}

tripleee · Accepted Answer · 2021-04-26 10:30:33Z

2

If your JSON data contains mojibake like this, you can convert it to proper Unicode by converting the string to Latin-1, then decoding the result as UTF-8. This reverses whichever process produced the mojibake. (The fact that the strings come from JSON is inconsequential; this works for any mojibake strings of this type.)

>>> s = "Wroc\u00c5\u0082aw"
>>> s.encode('latin-1').decode('utf-8')
'Wrocław'

In the general case, you have to reverse-engineer what produced the mojibake, but this particular case is easy to identify and troubleshoot, because the Latin-1 encoding in particular is obvious and transparent (every byte is encoded exactly as itself).

edited Apr 26, 2021 at 10:30

answered Apr 26, 2021 at 10:22

tripleee

192k37 gold badges318 silver badges367 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Reading json files with utf-8 characters with python

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related