1

how can I convert the following json format to the target format below? I have 50 thousand entries.
Basically, get the unique country from each array and include all other with the same country name under one array.

original json:

[
    {
        "unilist": [
                {
                    "country": "United States",
                    "name": "The College of New Jersey",
                    "web_page": "http://www.tcnj.edu"
                },
                {
                    "country": "United States",
                    "name": "Abilene Christian University",
                    "web_page": "http://www.acu.edu/"
                },
                {
                    "country": "United States",
                    "name": "Adelphi University",
                    "web_page": "http://www.adelphi.edu/"
                },
                {
                    "country": "China",
                    "name": "Harbin Medical University",
                    "web_page": "http://www.hrbmu.edu.cn/"
                },
                {
                    "country": "China",
                    "name": "Harbin Normal University",
                    "web_page": "http://www.hrbnu.edu.cn/"
                }
                ...
                ]
    }
]

target format:

{
"unilist" : {
        "United States" : [
          {"name" : "The College of New Jersey", "web_page" : "http://www.tcnj.edu"},
          {"name" : "Abilene Christian University", "web_page" : "http://www.acu.edu/"},
          {"name" : "Adelphi University", "web_page" : "http://www.adelphi.edu/"}
        ],
        "China" : [
          {"name" : "Harbin Medical University", "web_page" : "http://www.hrbnu.edu.cn/"}
        ],
        ...
    }
}

update

my attempt (in Python 2.7.11) based on the answer provided by downshift, however it is not working as expected, I get the following typeError:

from collections import defaultdict
import json
from pprint import pprint

with open('old_list.json') as orig_json:    
    newlist = defaultdict(list)

for country in orig_json[0]['unilist']:
    newlist[country['country']].append({'name': country['name'], 'web_page': country['web_page']})

with open('new_list.json', 'w') as fp:
            json.dump(newlist,fp)


pprint.pprint(dict(newlist))


TypeError:

Traceback (most recent call last):
  File "convert.py", line 8, in <module>
    for country in orig_json[0]['unilist']:
TypeError: 'file' object has no attribute '__getitem__'

1 Answer 1

3

This produces almost the same target output, only it's missing the "unilist" key. But at least it does group entries by country:

import json
from collections import defaultdict

with open('original.json', 'r') as original:
    orig_json = original.read()[1:-1] # Remove outermost list brackets([]) to enable parsing data as JSON data, not a list

oj = json.loads(orig_json)

newlist = defaultdict(list)

for country in oj['unilist']:
    newlist[country['country']].append({'name': country['name'], 
                                        'web_page': country['web_page']})

with open('new.json', 'w') as outfile:
    json.dump(newlist, outfile)

This will save the newlist to a json file 'newlist.json'

Output:

{'China': [{'name': 'Harbin Medical University',
            'web_page': 'http://www.hrbmu.edu.cn/'},
           {'name': 'Harbin Normal University',
            'web_page': 'http://www.hrbnu.edu.cn/'}],
 'United States': [{'name': 'The College of New Jersey',
                    'web_page': 'http://www.tcnj.edu'},
                   {'name': 'Abilene Christian University',
                    'web_page': 'http://www.acu.edu/'},
                   {'name': 'Adelphi University',
                    'web_page': 'http://www.adelphi.edu/'}]}

I'll update this answer if I get figure out a better way to get the exact target output. In the meantime, I hope this helps you.

Sign up to request clarification or add additional context in comments.

6 Comments

ok thanks, i am going to test this and let you know what the result is.
how do I open define my file entry , how to define defaultdict?
Oh sorry, I should have included that module import: from collections import defaultdict. added to edit
sorry i am a newbie here, could you help me to define and read my file into the code and then save it as another output file?
i updated my question based on your answer, could you please look at the error the code is throwing. why is it happening?
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.