3

Suppose I have two json files. I would like to be able to load both, then add the entries from the second into the first. This may include adding fields or list entries. Something like the following example:

file1.json:

{ "fruit": [ { "name": "apple", "color": "red" }, { "name": "orange", "color": "orange" } ] }

file2.json:

{ "fruit": [ { "name": "strawberry", "color": "red", "size": "small" }, { "name": "orange", "size": "medium" } ] }

result:

{ "fruit": [ { "name": "apple", "color": "red" }, { "name": "orange", "color": "orange", "size": "medium" }, { "name": "strawberry", "color": "red", "size": "small" } ] }

At first I thought to load them into dictionaries and try something like update:

    import simplejson
    
    filea = open("file1.json", 'r')
    dicta = simplejson.loads(filea.read())
    
    fileb = open("file2.json", 'r')
    dictb = simplejson.loads(fileb.read())
    
    filea.close()
    fileb.close()
    
    dicta.update(dictb)

Since both dictionaries have an entry for "fruit" I was hoping that they would merge, but it simple overrides the entry in dicta with the entry in dictb.

I realize I could write code to loop through this example, but the actual files I'm using are far larger and more complicated. I was wondering if there was a library out there that did something like this already before I go reinventing the wheel. For what it's worth, I am using Python 2.6.2.

Thanks for any advice or suggestions!

3
  • So you want to join the elements of fruit on their name value? Do you have control of the json format? And what are the rules if file1 and file2 have conflicting data in other fields (ex both have a color for apple)? Commented Sep 5, 2012 at 15:55
  • In this particular example I would like to join based on the name value yes. In real life there would be two specific fields that had to match instead of just the one, but the concept is similar. I have full control over the format of the second file and none whatsoever over the first file. There should never be conflicts, so the behavior in that case could be whatever is easier (eg override with the new one or keep the old one) Commented Sep 5, 2012 at 15:57
  • Excellent question, sorry I neglected to mention that sooner. I am using Python 2.6.2, and I have added that to the original post. Commented Sep 5, 2012 at 16:03

2 Answers 2

6

You'll need to extend the lists checking each value. There's no way Python can now you want to merge them based on name item of dictionaries:

def merge(lsta, lstb):
    for i in lstb:
        for j in lsta:
            if j['name'] == i['name']:
                j.update(i)
                break
        else:
            lsta.append(i)

for k,v in dictb.items():
    merge(dicta.setdefault(k, []), v)

So the dicta variable will be:

{'fruit': [{'color': 'red', 'name': 'apple'}, 
           {'color': 'orange', 'name': 'orange', 'size': 'medium'},
           {'color': 'red', 'name': 'strawberry', 'size': 'small'}]}
Sign up to request clarification or add additional context in comments.

1 Comment

I was hoping not to write code specifically for the format of the json file, because it is apt to change. Ideally I wanted a library or general function that could combine any two json files together. I suppose that was too much to hope for! Still, this code does perfectly solve the sample I posted and demonstrates the basis for what I will have to do on a larger and more complicated scale. I hope it will help anyone else facing a similar problem as well. Thanks for your answer!
0

Given parsed json in a list parsed_json:

transformed_data = []
for data in parsed_json:
    transformed_data.append({})
    for fruit in data['fruit']:
        fruit_copy = fruit.copy()
        transformed_data[-1][fruit_copy.pop('name')] = fruit_copy
merged_fruit = defaultdict(dict)
for name, values in transformed_data.iteritems():
    merged_fruit[name].update(values)

You could do it with a dict comprehension in 2.7+, but you said 2.6.2. Given your statement that in the real world, you are merging on more than one field, you can just change the key when setting transformed_data members to be whatever fields from the source you want. If you don't care about destroying the original parsed data, you can also drop the copy.

3 Comments

I had to play with this code a bit to get it to work for me, but in the end it does seem to solve the sample problem I posted. I like the idea of using multiple fields as the key, but it still only works for this basic json structure. The real world case involves much more complicated structures containing structures containing lists of lists... etc etc. I was hoping there was a library somewhere that could do it all, but it looks like in the end I'll have to write code for the specific format of the file. Thanks for your help and for the code sample you provided.
@xaevinx I could be wrong, but something like this is probably going to be quite difficult to find outside of an ORM. It is something that lots of people will have to do, but the logic for the transform is going to be so tailored to each scenario that generalizing the behavior in to a library would be difficult and probably more costly than just rolling your own.
I figured a generalized library would be suboptimal, but that would still be ideal for me. The time isn't much of a concern at all. The only problem for me is that the files continue to change, which will force the code to change each time as well.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.