4

I have 10,000's of json objects in a json file in following format :

{ "a": 1,
  "b" : 2,
  "c" : {
          "d":3
        }
}{ "e" : 4,
  "f" : 5,
  "g" : {
         "h":6
        }
}

How can I load these as a json object?

Two methods that I've tried with corresponding error :

Method 1 :

>>> with open('test1.json') as jsonfile:
...     for line in jsonfile:
...             data = json.loads(line)
... 

Error :

Traceback (most recent call last):
  File "<stdin>", line 3, in <module>
  File "/usr/lib/python3.5/json/__init__.py", line 319, in loads
    return _default_decoder.decode(s)
  File "/usr/lib/python3.5/json/decoder.py", line 339, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/lib/python3.5/json/decoder.py", line 355, in raw_decode
    obj, end = self.scan_once(s, idx)
json.decoder.JSONDecodeError: Expecting property name enclosed in double quotes: line 2 column 1 (char 10)

Method 2 :

>>> with open('test1.json') as jsonfile:
...     data = json.load(jsonfile)      
... 

Error :

Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
  File "/usr/lib/python3.5/json/__init__.py", line 268, in load
    parse_constant=parse_constant, object_pairs_hook=object_pairs_hook, **kw)
  File "/usr/lib/python3.5/json/__init__.py", line 319, in loads
    return _default_decoder.decode(s)
  File "/usr/lib/python3.5/json/decoder.py", line 342, in decode
    raise JSONDecodeError("Extra data", s, end)
json.decoder.JSONDecodeError: Extra data: line 7 column 1 (char 46)
>>> 

I've read the related questions but none of them helped.

9
  • The first bit of code you've posted contains two JSON objects, not one. That's probably the cause of the json.load error. Commented Sep 25, 2018 at 19:11
  • 1
    Are there always blank lines between the objects, and are thoae the only blank lines? Commented Sep 25, 2018 at 19:13
  • @DanielRoseman no. I've edited. Have a look now. Commented Sep 25, 2018 at 19:15
  • 1
    Is there a possibility to change the process that generate JSON file to insert 2, or more, break lines between each JSON? So the approach that has @DanielRoseman thought, would to work. Commented Sep 25, 2018 at 19:19
  • 1
    @krishna serialize it as a list of objects, or use a different format, like json-lines. Commented Sep 25, 2018 at 19:30

6 Answers 6

6

The content of file you described is not a valid JSON object this is why bot approaches are not working.

To transform in something you can load with json.load(fd) you have to:

  1. add a [ at the beginning of the file
  2. add a , between each object
  3. add a ] at the very end of the file

then you can use the Method 2. For instance:

[ { "a": 1,
    "b" : 2,
    "c" : {
      "d":3
    }
  }, { "e" : 4,
       "f" : 5,
       "g" : {
         "h":6
       }
  }
]

is a valid JSON array

If the file format is exactly as you've described you could do

with open(filename, 'r') as infile:
    data = infile.read()
    new_data = data.replace('}{', '},{')
    json_data = json.loads(f'[{new_data}]')
Sign up to request clarification or add additional context in comments.

1 Comment

Your second method helped. Thanks a lot.
4

I believe that the best approach if you don't want to change the source file would be to use json.JSONDecoder.raw_decode() It would allow you to iterate through each valid json object you have in the file

from json import JSONDecoder, JSONDecodeError

decoder = JSONDecoder()
content = '{ "a": 1,  "b": 2,  "c": { "d":3 }}{ "e": 4, "f": 5,  "g": {"h":6 } }'

pos = 0
while True:
    try:
        o, pos = decoder.raw_decode(content, pos)
        print(o)
    except JSONDecodeError:
        break

Would print your two Json objects

2 Comments

Need to increment pos, see code sample from @noredistribution below
It seems the second positional argument to raw_decode vanished. :(
3

As Daniel said in comment, focuses on the pattern of start/end of JSONs chunks.As you updated, the pattern is }{.

Load all data to an string, replace this pattern to a pattern you can handle, and split it into a list of strings of valid JSON data. At the end, iterate over list.

{ "a": 1,
"b" : 2,
"c" : {
        "d":3
        }
}{ "e" : 4,
"f" : 5,
"g" : {
        "h":6
        }
}

Load data to a list of json valid strings

with open('/home/mauro/workspace/test.json') as fp:
    data = fp.read()

Replace the pattern

data = data.replace('}{', '}\n\n{')

Then, split it into a list of json strings valids

data = data.split('\n\n')

In the end, iterate over list of json strings

for i in data:
    print json.loads(i)

Comments

1

@Thiago's answer worked for me but only if pos is being increased by one, otherwise it'll always just print one object

so like this:

from json import JSONDecoder, JSONDecodeError

def json_decoder(data):
    decoder = JSONDecoder()
    pos = 0
    result = []
    while True:
        try:
            o, pos = decoder.raw_decode(data, pos)
            result.append(o)
            pos +=1
        except JSONDecodeError:
            break
    return result

Comments

1

I created this script that takes advantage of the exception including the character where the json ends:

import json
from json.decoder import JSONDecodeError

with open("file.json", 'r') as file:
    contents = file.read()
start=0
end=len(contents)
json_objects=[]
while start < len(contents):
    try:
        json_objects.append(json.loads(contents[start:end]))
        print(f"Loaded from {start} to {end}")
        start=end
        end=len(contents)
    except JSONDecodeError as e:
        end=start+e.pos
for json_object in json_objects:
    print(len(json.dumps(json_object)))

It is not at all efficient and requires the entire file to be loaded into memory, but it does work

Comments

0
[
    { 
      "a": 1,
      "b" : 2,
      "c" : {
              "d":3
            }
    },
    { 
      "e" : 4,
      "f" : 5,
      "g" : {
             "h":6
            }
    }
]

First your json file should look like this, second load your file like json.loads(file.read())

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.