3

I am downloading Json files from an API, I use the following code to write the JSON. Each item the loop gives me a JSON file. I need to save it and extract entities from the appended JSON file using a loop.

for item in style_ls:
    dat = get_json(api, item)
    specs_dict[item] = dat
    with open("specs_append.txt", "a") as myfile:
        json.dump(dat, myfile)
        myfile.close()
    print item

with open ("specs_data.txt", "w") as my file:
    json.dump(spec_dict, myfile)
    myfile.close()

I know that I cannot get a valid JSON format from the specs_append.txt, but I can get one from the specs_data.txt. I am doing the first one just because my program needs atleast 3-4 days to complete and there are high chances that my system may shutdown. So is there anyway I can do this efficiently ?

If not is there anyway I can extract it from specs_append.txt <{JSON}{JSON}> format (which is not a valid JSON format)?

If not should I write specs_dict to a txt file every time in the loop, so that even if program gets terminated i can start if from that point in loop and still get a valid json format?

1
  • 1
    myfile.close is not required in the with block. The context manager takes care of that for you. Commented Jun 26, 2014 at 6:51

2 Answers 2

2

I suggest several possible solutions.

One solution is to write custom code to slurp in the input file. I would suggest putting a special line before each JSON object in the file, such as: ###

Then you could write code like this:

import json

def json_get_objects(f):
    temp = ''
    line = next(f)  # pull first line
    assert line == SPECIAL_LINE

    for line in f:
        if line != SPECIAL_LINE:
            temp += line
        else:
            # found special marker, temp now contains a complete JSON object
            j = json.loads(temp)
            yield j
            temp = ''
    # after loop done, yield up last JSON object
    if temp:
        j = json.loads(temp)
        yield j

with open("specs_data.txt", "r") as f:
    for j in json_get_objects(f):
        pass # do something with JSON object j

Two notes on this. First, I am simply appending to a string over and over; this used to be a very slow way to do this in Python, so if you are using a very old version of Python, don't do it this way unless your JSON objects are very small. Second, I wrote code to split the input and yield up JSON objects one at a time, but you could also use a guaranteed-unique string, slurp in all the data with a single call to f.read() and then split on your guaranteed-unique string using the str.split() method function.

Another solution would be to write the whole file as a valid JSON list of valid JSON objects. Write the file like this:

{"mylist":[
# first JSON object, followed by a comma
# second JSON object, followed by a comma
# third JSON object
]}

This would require your file appending code to open the file with writing permission, and seek to the last ] in the file before writing a comma plus newline, then the new JSON object on the end, and then finally writing ]} to close out the file. If you do it this way, you can use json.loads() to slurp the whole thing in and have a list of JSON objects.

Finally, I suggest that maybe you should just use a database. Use SQLite or something and just throw the JSON strings in to a table. If you choose this, I suggest using an ORM to make your life simple, rather than writing SQL commands by hand.

Personally, I favor the first suggestion: write in a special line like ###, then have custom code to split the input on those marks and then get the JSON objects.

EDIT: Okay, the first suggestion was sort of assuming that the JSON was formatted for human readability, with a bunch of short lines:

{
    "foo": 0,
    "bar": 1,
    "baz": 2
}

But it's all run together as one big long line:

{"foo":0,"bar":1,"baz":2}

Here are three ways to fix this.

0) write a newline before the ### and after it, like so:

###
{"foo":0,"bar":1,"baz":2}
###
{"foo":0,"bar":1,"baz":2}

Then each input line will alternately be ### or a complete JSON object.

1) As long as SPECIAL_LINE is completely unique (never appears inside a string in the JSON) you can do this:

with open("specs_data.txt", "r") as f:
    temp = f.read()  # read entire file contents
    lst = temp.split(SPECIAL_LINE)
    json_objects = [json.loads(x) for x in lst]
    for j in json_objects:
        pass # do something with JSON object j

The .split() method function can split up the temp string into JSON objects for you.

2) If you are certain that each JSON object will never have a newline character inside it, you could simply write JSON objects to the file, one after another, putting a newline after each; then assume that each line is a JSON object:

import json

def json_get_objects(f):
    for line in f:
        if line.strip():
            yield json.loads(line)

with open("specs_data.txt", "r") as f:
    for j in json_get_objects(f):
        pass # do something with JSON object j

I like the simplicity of option (2), but I like the reliability of option (0). If a newline ever got written in as part of a JSON object, option (0) would still work, but option (2) would error.

Again, you can also simply use an actual database (SQLite) with an ORM and let the database worry about the details.

Good luck.

Sign up to request clarification or add additional context in comments.

6 Comments

1) Firstly i replaced the {Json}{Json} with {Json}###{Json} in my input. Then I replaced SPECIAL_LINE in both lines in the code with '###' and ran the code. 2) The generator always reads only total text in my input file, but not single line at a time, even if I introduce newline with '###', the next generator is only reading the first line.
Here is how my input file looks like: 1st Json starts here {"enginesCount": 1, "engines": [{"cylinder": 8, "code": "8", "name": "E", "type": "d", "compressorType": "t", "torque": 430, "equipmentType": "E", "id": "2", "horsepower": 195, "configuration": "V", "fuelType": "d", "availability": "STANDARD", "size": 6.5}]} 2nd Json starts here {"enginesCount": 1, "engines": [{"cylinder": 8, "code": "8", "name": "E", "type": "d", "compressorType": "tu", "torque": 430, "equipmentType": "E", "id": "20", "horsepower": 195, "configuration": "V", "fuelType": "d", "availability": "S", "size": 6.5}]}
Okay, sorry that I didn't anticipate that. Your JSON file doesn't have a bunch of short lines, it is all one long line. I'll update the answer.
Thank you again. Your suggestion of using SQlite seems better to me. I am thinking of using python peewee for that.
I used the code written by you(1). At json_objects = [json.loads(x) for x in lst] it gives me a Value error: No Json Object could be decoded. I used it on the example you gave too, it gives me the same error.
|
2

Append json data to a dict on every loop.

In the end dump this dict as a json and write it to a file.

For getting you an idea for appending data to dict:

>>> d1 = {'suku':12}
>>> t1 = {'suku1':212}
>>> d1.update(t1)
>>> d1
{'suku1': 212, 'suku': 12}

1 Comment

There is a chance of program termination during the loop, so the dict may not be written if the program terminates.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.