Getting readable text in a large JSON file in python

Question

I have some nested data that I want to write as JSON. However, it contains a very large string which ends up dumped like this:

\u0398\u03b5\u03b1\u03c4\u03c1\u03b9\u03ba\u03cc

I need the dumped JSON to still have the original, human readable text. I was looking in some other posts, but didn't find any solution.

This is how I currently dump the data:

rawtext = json.dumps(result, indent=2, sort_keys=True)
with open("result.txt", "a+", encoding="utf-8-sig") as f:
    f.write(rawtext)

This is example from the result.txt file:

{
  "date": "\u03a0\u03c1\u03b9\u03bd \u03b1\u03c0\u03cc 3 \u03ce\u03c1\u03b5\u03c2",
  "date_utc": "2020-04-01T16:12:41.903Z",
  "domain": "www.protothema.gr",
  "link": "https://www.protothema.gr/culture/article/991328/menoume-spiti-tzaz-taxidia-apo-to-kedro-politismou-idruma-stauros-niarhos/",
  "position": 1,
  "snippet": "... \u03c4\u03bf \u0398\u03b5\u03b1\u03c4\u03c1\u03b9\u03ba\u03cc \u0391\u03bd\u03b1\u03bb\u03cc\u03b3\u03b9\u03bf \u03c4\u03bf\u03c5 \u039a\u03ad\u03bd\u03c4\u03c1\u03bf\u03c5 \u03a0\u03bf\u03bb\u03b9\u03c4\u03b9\u03c3\u03bc\u03bf\u03cd \u038a\u03b4\u03c1\u03c5\u03bc\u03b1 \u03a3\u03c4\u03b1\u03cd\u03c1\u03bf\u03c2 \u039d\u03b9\u03ac\u03c1\u03c7\u03bf\u03c2! \u03a3\u03c4o \u03c0\u03c1\u03ce\u03c4o \u03b5\u03b2\u03b4\u03bf\u03bc\u03b1\u03b4\u03b9\u03b1\u03af\u03bf \u03c4\u03b6\u03b1\u03b6 \u03c1\u03b1\u03bd\u03c4\u03b5\u03b2\u03bf\u03cd \u03bf \u03ba\u03bf\u03c1\u03c5\u03c6\u03b1\u03af\u03bf\u03c2 \u0388\u03bb\u03bb\u03b7\u03bd\u03b1\u03c2 \u03c0\u03b9\u03b1\u03bd\u03af\u03c3\u03c4\u03b1\u03c2 \u03c4\u03b7\u03c2 jazz,\u00a0...",
  "source": "\u03a0\u03c1\u03ce\u03c4\u03bf \u0398\u0395\u039c\u0391",
  "thumbnail": "https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcSGRNpdq5fhYi7be2t7UZ-hh-cQvjJqtsnJhN0ShCL7A6DqqPH9aop33FRGTcyfF2gsaU09SG-P&s",
  "title": "\u00ab\u039c\u03b5\u03bd\u03bf\u03c5\u03bc\u03b5 \u03a3\u03c0\u03af\u03c4\u03b9\u00bb: \u03a4\u03b6\u03b1\u03b6 \u03c4\u03b1\u03be\u03af\u03b4\u03b9\u03b1 \u03b1\u03c0\u03cc \u03c4\u03bf \u039a\u03ad\u03bd\u03c4\u03c1\u03bf \u03a0\u03bf\u03bb\u03b9\u03c4\u03b9\u03c3\u03bc\u03bf\u03cd ..."
}

Hey @quamrana , i know that . But i need to save it in file, not just to print it. Check again the post, i just edited it. — Alex Kalaidjiev
– Alex Kalaidjiev, Commented Apr 1, 2020 at 19:48
If you are using json.dumps() to convert result (I'm assuming its a list or dict) to a json string, then it won't be readable wherever it is when it contains unicode strings. — quamrana
– quamrana, Commented Apr 1, 2020 at 19:53

MisterMiyagi · Accepted Answer · 2022-11-10 10:05:33Z

1

By default, the json module escapes all non-ascii characters. Use ensure_ascii=False to keep all unicode characters unescaped:

>>> print(json.dumps("""{"date": "Πριν από 3 ώρες"}"""))
"{\"date\": \"\u03a0\u03c1\u03b9\u03bd \u03b1\u03c0\u03cc 3 \u03ce\u03c1\u03b5\u03c2\"}"
>>> print(json.dumps("""{"date": "Πριν από 3 ώρες"}""", ensure_ascii=False))
"{\"date\": \"Πριν από 3 ώρες\"}"

Simply pass the parameter when dumping your data:

with open("result.txt", "a+", encoding="utf-8-sig") as f:
    json.dump(result, f, ensure_ascii=False, indent=2, sort_keys=True)

Note that JSON with and without non-ascii escaping are equivalent as far as the JSON standard is concerned. While a pure-ASCII dump may not appear human-readable, a JSON compliant reader such as json.load will read back the original data properly.

edited Nov 10, 2022 at 10:05

answered Apr 1, 2020 at 20:24

MisterMiyagi

53.4k14 gold badges131 silver badges138 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Alex Kalaidjiev Over a year ago

That was the problem ! It worked for me, thanks alot !

MisterMiyagi Over a year ago

Glad to have helped. Please take the time to have a look at the What should I do when someone answers my question? help page.

Collectives™ on Stack Overflow

Getting readable text in a large JSON file in python

1 Answer 1

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related