3

So here is the standard way to read in a JSON file in python

import json
from pprint import pprint

with open('ig001.json') as data_file:    
    data = json.load(data_file)

pprint(data)

However, my JSON file that I want to read has multiple JSON objects in it. So it looks something like:

[{},{}.... ]

[{},{}.... ]

Where this represents 2 JSON objects, and inside each object inside each {}, there are a bunch of key:value pairs.

So when I try to read this using the standard read code that I have above, I get the error:

Traceback (most recent call last): File "jsonformatter.py", line 5, in data = json.load(data_file) File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/init.py", line 290, in load **kw) File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/init.py", line 338, in loads return _default_decoder.decode(s) File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/decoder.py", line 369, in decode raise ValueError(errmsg("Extra data", s, end, len(s))) ValueError: Extra data: line 3889 column 2 - line 719307 column 2 (char 164691 - 30776399)

Where line 3889 is where the first JSON object ends and the next one begins, the line itself looks like "][".

Any ideas on how to fix this would be appreciated, thanks!

3
  • 2
    Can you post your json file? Commented Apr 8, 2016 at 6:09
  • You mean multiple JSON arrays, right? Commented Apr 8, 2016 at 6:15
  • Please provide more data on the JSON file that you have. Commented May 13, 2016 at 3:12

1 Answer 1

2

Without a link your JSON file, I'm going to have to make some assumptions:

  1. Top-level json arrays are not each on their own line (since the first parsing error is on line 3889), so we can't easily
  2. This is the only type of invalid JSON present in the file.

To fix this:

# 1. replace instances of `][` with `]<SPLIT>[`
# (`<SPLIT>` needs to be something that is not present anywhere in the file to begin with)

raw_data = data_file.read()  # we're going to need the entire file in memory
tweaked_data = raw_data.replace('][', ']<SPLIT>[')

# 2. split the string into an array of strings, using the chosen split indicator

split_data = tweaked_data.split('<SPLIT>')

# 3. load each string individually

parsed_data = [json.loads(bit_of_data) for bit_of_data in split_data]

(pardon the horrible variable names)

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.