0

I have some nested data that I want to write as JSON. However, it contains a very large string which ends up dumped like this:

\u0398\u03b5\u03b1\u03c4\u03c1\u03b9\u03ba\u03cc

I need the dumped JSON to still have the original, human readable text. I was looking in some other posts, but didn't find any solution.

This is how I currently dump the data:

rawtext = json.dumps(result, indent=2, sort_keys=True)
with open("result.txt", "a+", encoding="utf-8-sig") as f:
    f.write(rawtext)

This is example from the result.txt file:

{
  "date": "\u03a0\u03c1\u03b9\u03bd \u03b1\u03c0\u03cc 3 \u03ce\u03c1\u03b5\u03c2",
  "date_utc": "2020-04-01T16:12:41.903Z",
  "domain": "www.protothema.gr",
  "link": "https://www.protothema.gr/culture/article/991328/menoume-spiti-tzaz-taxidia-apo-to-kedro-politismou-idruma-stauros-niarhos/",
  "position": 1,
  "snippet": "... \u03c4\u03bf \u0398\u03b5\u03b1\u03c4\u03c1\u03b9\u03ba\u03cc \u0391\u03bd\u03b1\u03bb\u03cc\u03b3\u03b9\u03bf \u03c4\u03bf\u03c5 \u039a\u03ad\u03bd\u03c4\u03c1\u03bf\u03c5 \u03a0\u03bf\u03bb\u03b9\u03c4\u03b9\u03c3\u03bc\u03bf\u03cd \u038a\u03b4\u03c1\u03c5\u03bc\u03b1 \u03a3\u03c4\u03b1\u03cd\u03c1\u03bf\u03c2 \u039d\u03b9\u03ac\u03c1\u03c7\u03bf\u03c2! \u03a3\u03c4o \u03c0\u03c1\u03ce\u03c4o \u03b5\u03b2\u03b4\u03bf\u03bc\u03b1\u03b4\u03b9\u03b1\u03af\u03bf \u03c4\u03b6\u03b1\u03b6 \u03c1\u03b1\u03bd\u03c4\u03b5\u03b2\u03bf\u03cd \u03bf \u03ba\u03bf\u03c1\u03c5\u03c6\u03b1\u03af\u03bf\u03c2 \u0388\u03bb\u03bb\u03b7\u03bd\u03b1\u03c2 \u03c0\u03b9\u03b1\u03bd\u03af\u03c3\u03c4\u03b1\u03c2 \u03c4\u03b7\u03c2 jazz,\u00a0...",
  "source": "\u03a0\u03c1\u03ce\u03c4\u03bf \u0398\u0395\u039c\u0391",
  "thumbnail": "https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcSGRNpdq5fhYi7be2t7UZ-hh-cQvjJqtsnJhN0ShCL7A6DqqPH9aop33FRGTcyfF2gsaU09SG-P&s",
  "title": "\u00ab\u039c\u03b5\u03bd\u03bf\u03c5\u03bc\u03b5 \u03a3\u03c0\u03af\u03c4\u03b9\u00bb: \u03a4\u03b6\u03b1\u03b6 \u03c4\u03b1\u03be\u03af\u03b4\u03b9\u03b1 \u03b1\u03c0\u03cc \u03c4\u03bf \u039a\u03ad\u03bd\u03c4\u03c1\u03bf \u03a0\u03bf\u03bb\u03b9\u03c4\u03b9\u03c3\u03bc\u03bf\u03cd ..."
}
2
  • Hey @quamrana , i know that . But i need to save it in file, not just to print it. Check again the post, i just edited it. Commented Apr 1, 2020 at 19:48
  • If you are using json.dumps() to convert result (I'm assuming its a list or dict) to a json string, then it won't be readable wherever it is when it contains unicode strings. Commented Apr 1, 2020 at 19:53

1 Answer 1

1

By default, the json module escapes all non-ascii characters. Use ensure_ascii=False to keep all unicode characters unescaped:

>>> print(json.dumps("""{"date": "Πριν από 3 ώρες"}"""))
"{\"date\": \"\u03a0\u03c1\u03b9\u03bd \u03b1\u03c0\u03cc 3 \u03ce\u03c1\u03b5\u03c2\"}"
>>> print(json.dumps("""{"date": "Πριν από 3 ώρες"}""", ensure_ascii=False))
"{\"date\": \"Πριν από 3 ώρες\"}"

Simply pass the parameter when dumping your data:

with open("result.txt", "a+", encoding="utf-8-sig") as f:
    json.dump(result, f, ensure_ascii=False, indent=2, sort_keys=True)

Note that JSON with and without non-ascii escaping are equivalent as far as the JSON standard is concerned. While a pure-ASCII dump may not appear human-readable, a JSON compliant reader such as json.load will read back the original data properly.

Sign up to request clarification or add additional context in comments.

2 Comments

That was the problem ! It worked for me, thanks alot !
Glad to have helped. Please take the time to have a look at the What should I do when someone answers my question? help page.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.