how can I convert the following json format to the target format below? I have 50 thousand entries.
Basically, get the unique country from each array and include all other with the same country name under one array.
original json:
[
{
"unilist": [
{
"country": "United States",
"name": "The College of New Jersey",
"web_page": "http://www.tcnj.edu"
},
{
"country": "United States",
"name": "Abilene Christian University",
"web_page": "http://www.acu.edu/"
},
{
"country": "United States",
"name": "Adelphi University",
"web_page": "http://www.adelphi.edu/"
},
{
"country": "China",
"name": "Harbin Medical University",
"web_page": "http://www.hrbmu.edu.cn/"
},
{
"country": "China",
"name": "Harbin Normal University",
"web_page": "http://www.hrbnu.edu.cn/"
}
...
]
}
]
target format:
{
"unilist" : {
"United States" : [
{"name" : "The College of New Jersey", "web_page" : "http://www.tcnj.edu"},
{"name" : "Abilene Christian University", "web_page" : "http://www.acu.edu/"},
{"name" : "Adelphi University", "web_page" : "http://www.adelphi.edu/"}
],
"China" : [
{"name" : "Harbin Medical University", "web_page" : "http://www.hrbnu.edu.cn/"}
],
...
}
}
update
my attempt (in Python 2.7.11) based on the answer provided by downshift, however it is not working as expected, I get the following typeError:
from collections import defaultdict
import json
from pprint import pprint
with open('old_list.json') as orig_json:
newlist = defaultdict(list)
for country in orig_json[0]['unilist']:
newlist[country['country']].append({'name': country['name'], 'web_page': country['web_page']})
with open('new_list.json', 'w') as fp:
json.dump(newlist,fp)
pprint.pprint(dict(newlist))
TypeError:
Traceback (most recent call last):
File "convert.py", line 8, in <module>
for country in orig_json[0]['unilist']:
TypeError: 'file' object has no attribute '__getitem__'