1

I have lots of JSON files to parse, each between 1-2 Mb in size. Ordinarily I would have no issue loading data from a JSON as a dict using json.load(json_file). However, in this case the JSONs are strings of multiple nested dictionaries, all in one line.

Dictionaries are not delimited by "," as they would be in a list. I just have one very long string of nested dictionaries per file. For example, in the snippet below I have two nested dictionaries, each with a single key at the outer level of the dict ("GGGGHH" and "GGGHGH" for the first and second dictionaries, respectively).

{"GGGGHH": {"b2": {"spectrum_89": ["115.0502"]}, "b3": {"spectrum_89": ["172.0716"], "spectrum_107": ["172.0717"]}, "b4": {"spectrum_89": ["229.0934"]}, "b5": {"spectrum_89": ["366.1527"], "spectrum_107": ["366.1537"]}, "y1": {"spectrum_89": ["156.0769"], "spectrum_107": ["156.0769"]}, "y2": {"spectrum_89": ["293.1353"]}, "y3": {"spectrum_89": ["372.1407"], "spectrum_107": ["350.1563"]}, "a4": {"spectrum_89": ["202.1087"]}, "ImH": {"spectrum_89": ["110.0715"], "spectrum_107": ["110.0715"]}}}{"GGGHGH": {"b2": {"spectrum_89": ["115.0502"]}, "b3": {"spectrum_89": ["172.0716"], "spectrum_107": ["172.0717"]}, "b4": {"spectrum_89": ["309.1312"], "spectrum_107": ["309.1314"]}, "b5": {"spectrum_89": ["366.1527"], "spectrum_107": ["366.1537"]}, "y1": {"spectrum_89": ["156.0769"], "spectrum_107": ["156.0769"]}, "y2": {"spectrum_89": ["213.0985"], "spectrum_107": ["213.0985"]}, "y3": {"spectrum_89": ["372.1407"], "spectrum_107": ["350.1563"]}, "ImH": {"spectrum_89": ["110.0715"], "spectrum_107": ["110.0715"]}}}

I have seen examples of parsing multiple JSON objects, but only when they are in an array.

Can anyone help with this? I have no control over the format of the JSON files, so regenerating the data in an easier format is not an option. Apologies if this question has been answered before - I couldn't see any answers that would work for this particular case.

3
  • 3
    It's absolutely immaterial whether JSON is split into multiple lines or presented in a single line, as long as it's well formed. The rest is just beautification. Commented Sep 24, 2020 at 16:57
  • Could you please add an example of desired output as well as your code (if you have written any)? Commented Sep 24, 2020 at 16:57
  • 3
    Looks like invalid JSON to me. I ran it through https://jsonlint.com/ Commented Sep 24, 2020 at 16:58

2 Answers 2

1

This looks very much like malformed ndjson. you can replace }{ with }\n{ and then use ndjson

import ndjson
with open('spam.json') as f:
    source = f.read()
    source = source.replace('}{', '}\n{')
    data = ndjson.loads(source)

print(data)
Sign up to request clarification or add additional context in comments.

Comments

0

Your string is invalid json, but it looks like it's just a bunch of valid json dictionaries joined back-to-back without commas.

Just add commas between the dictionaries by replacing any occurrences of "}{" with "}, {", stick it in between "[" and "]" to make it valid json for a list of dictionaries, and you're good to json.loads!

s = '{"GGGGHH": {"b2": {"spectrum_89": ["115.0502"]}, "b3": {"spectrum_89": ["172.0716"], "spectrum_107": ["172.0717"]}, "b4": {"spectrum_89": ["229.0934"]}, "b5": {"spectrum_89": ["366.1527"], "spectrum_107": ["366.1537"]}, "y1": {"spectrum_89": ["156.0769"], "spectrum_107": ["156.0769"]}, "y2": {"spectrum_89": ["293.1353"]}, "y3": {"spectrum_89": ["372.1407"], "spectrum_107": ["350.1563"]}, "a4": {"spectrum_89": ["202.1087"]}, "ImH": {"spectrum_89": ["110.0715"], "spectrum_107": ["110.0715"]}}}{"GGGHGH": {"b2": {"spectrum_89": ["115.0502"]}, "b3": {"spectrum_89": ["172.0716"], "spectrum_107": ["172.0717"]}, "b4": {"spectrum_89": ["309.1312"], "spectrum_107": ["309.1314"]}, "b5": {"spectrum_89": ["366.1527"], "spectrum_107": ["366.1537"]}, "y1": {"spectrum_89": ["156.0769"], "spectrum_107": ["156.0769"]}, "y2": {"spectrum_89": ["213.0985"], "spectrum_107": ["213.0985"]}, "y3": {"spectrum_89": ["372.1407"], "spectrum_107": ["350.1563"]}, "ImH": {"spectrum_89": ["110.0715"], "spectrum_107": ["110.0715"]}}}'
json.loads("[" + s.replace("}{", "}, {") + "]")

Output:

[{'GGGGHH': {'b2': {'spectrum_89': ['115.0502']},
   'b3': {'spectrum_89': ['172.0716'], 'spectrum_107': ['172.0717']},
   'b4': {'spectrum_89': ['229.0934']},
   'b5': {'spectrum_89': ['366.1527'], 'spectrum_107': ['366.1537']},
   'y1': {'spectrum_89': ['156.0769'], 'spectrum_107': ['156.0769']},
   'y2': {'spectrum_89': ['293.1353']},
   'y3': {'spectrum_89': ['372.1407'], 'spectrum_107': ['350.1563']},
   'a4': {'spectrum_89': ['202.1087']},
   'ImH': {'spectrum_89': ['110.0715'], 'spectrum_107': ['110.0715']}}},
 {'GGGHGH': {'b2': {'spectrum_89': ['115.0502']},
   'b3': {'spectrum_89': ['172.0716'], 'spectrum_107': ['172.0717']},
   'b4': {'spectrum_89': ['309.1312'], 'spectrum_107': ['309.1314']},
   'b5': {'spectrum_89': ['366.1527'], 'spectrum_107': ['366.1537']},
   'y1': {'spectrum_89': ['156.0769'], 'spectrum_107': ['156.0769']},
   'y2': {'spectrum_89': ['213.0985'], 'spectrum_107': ['213.0985']},
   'y3': {'spectrum_89': ['372.1407'], 'spectrum_107': ['350.1563']},
   'ImH': {'spectrum_89': ['110.0715'], 'spectrum_107': ['110.0715']}}}]

For a more general case (for example, if there can exist whitespace between two dictionaries, use regular expressions to replace.

json.loads("[" + re.sub(r"\}\s*\{", "}, {", s) + "]")

where the regex "\}\s*\{" matches }, followed by 0 or more whitespace characters, followed by {.

4 Comments

as an alternative, one can replace }{ with }\n{ and then just use ndjson. I will add an example
@buran Interesting, TIL such a thing exists. Are there any advantages to doing it using ndjson instead of parsing it as a comma-separated list?
I would say it's a matter of personal preference. That's why I said "as an alternative".
Apologies for the late reply. This worked perfectly, thank you!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.