2

Assume I have the following lists

list1 = [{"created_at": "2012-01-31T10:00:04Z"},{"created_at": "2013-01-31T10:00:04Z"}] 
list2 = [{"created_at": "2014-01-31T10:00:04Z"}] 

I can write the first list to a JSON file using json.dump(list1,file,indent=2) and the result is

[
  {
    "created_at": "2012-01-31T10:00:04Z"
  },
  {
    "created_at": "2013-01-31T10:00:04Z"
  }
]

My question is, how do I append the contents of the second list? if I simple do json.dump(list2,file,indent=2), it results in an invalid JSON file as below.

[
  {
    "created_at": "2012-01-31T10:00:04Z"
  },
  {
    "created_at": "2013-01-31T10:00:04Z"
  }
][
  {
    "created_at": "2014-01-31T10:00:04Z"
  }
]

Edit: The lists are created dynamically by parsing about 8000 files. The above lists are just example. I could potentially be writing 8000 lists to the JSON file, so simple appending will not work.

6
  • If you mean "append" in the file sense (i.e., opening the file with mode "a"), I doubt you can. Commented Nov 25, 2013 at 7:39
  • As I understand you know how to extend lists in python and ask how to correct dump lists into json file. I think there is no possibility to do it as you want. Redesign your program if you can to have only one dump call Commented Nov 25, 2013 at 7:39
  • "simple appending will not work." Did you try it? Commented Nov 25, 2013 at 7:41
  • @LutzHorn lists are created inside a method. So when I am parsing a file, I don't have the list from previous files. Even if I did have, the total data is ~50 gigs. Should I retain lists of that size throughout the program execution? Commented Nov 25, 2013 at 7:45
  • It doesn't sound like JSON is the right format for the job then. It's more for transfer than storage - basically for the reason you've mentioned: To append you need to parse all the data which, in this case, is unreasonable. Commented Nov 25, 2013 at 7:49

4 Answers 4

3
In [1]: import json

In [2]: list1 = [{"created_at": "2012-01-31T10:00:04Z"},{"created_at": "2013-01-31T10:00:04Z"}] 

In [3]: list2 = [{"created_at": "2014-01-31T10:00:04Z"}] 

In [4]: list1.extend(list2)

In [5]: json.dumps(list1)
Out[5]: '[{"created_at": "2012-01-31T10:00:04Z"}, {"created_at": "2013-01-31T10:00:04Z"}, {"created_at": "2014-01-31T10:00:04Z"}]'

or

In [8]: json.dumps(list1 + list2)
Out[8]: '[{"created_at": "2012-01-31T10:00:04Z"}, {"created_at": "2013-01-31T10:00:04Z"}, {"created_at": "2014-01-31T10:00:04Z"}]'
Sign up to request clarification or add additional context in comments.

2 Comments

I have about 8000 files that I parse and create lists. These lists have to be written into 3 files, so appending will not work.
Then please rephrase your qustion and include all relevant information.
1

When parse the files append (or extend) to a unique list and finally convert to JSON. Assume that your function for parse is parse.

>>> import json
>>> result = []
>>> for file in files:
...     result.append(parse(file))
...
>>> json.dump(result, file1, indent=2)

Comments

0

I found a little lacking in the explanation given below that's why trying to make a point considered over here. A Json file can have a single parent element. Therefore, if at the first iteration, you dump 1st list then at the 2nd iteration, you will get the formatting error in the file. B/c Json demands these two lists to be wrapped inside one list/array before dumping.

Therefore, you store all lists in one list (either using appending or any other above-mentioned methods). And then you dump this aggregated list into your Json file. However, if you do not want to do so, you will have to create different files for your different lists.

Comments

0

I had similar problem, wanted valid json, not json lines, made custom solution:

# a+ is not good beacuse you can't seek before last end
open(out_path, 'r+' if path.isfile(out_path) else 'w+') as o:

# had multiple open calls, so little more complication

    for obj in data:
        for key in columns:
            with open_output(json_path, key) as o:
                o.seek(0, io.SEEK_END)
                if o.tell() == 0:
                    o.write('[\n')
                else:
                    o.seek(o.tell() - 3)
                    o.write(',\n')
                json.dump({key: obj[key] if key in obj else columns[key]}, o, indent=2)
                o.write('\n]')

Output file is valid json after each write, indentation was no relevant for me, but one line per object was not readable so ended up with this.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.