I am attempting to convert nested JSON data to a flat table:
(I have edited this as I thought I had a working solution and was asking for advice on optimisation, turns out I don't have it working...)
import pandas as pd
import json
from collections import OrderedDict
# https://stackoverflow.com/questions/36720940/parsing-nested-json-into-dataframe
def flatten_json(json_object, container=None, name=''):
if container is None:
container = OrderedDict()
if isinstance(json_object, dict):
for key in json_object:
flatten_json(json_object[key], container=container, name=name + key + '_')
elif isinstance(json_object, list):
for n, item in enumerate(json_object, 1):
flatten_json(item, container=container, name=name + str(n) + '_')
else:
container[str(name[:-1])] = str(json_object)
return container
data = '{"page":1,"pages":2,"totaItems":22,"data":[{"eId":38344,"bId":29802,"fname":"Adon","cId":21,"cName":"Regional","vType":"None","totalMinutes":590,"minutesExcludingViolations":590,"sId":15,"snme":"CD","customFields":[{"id":3,"value":false},{"id":4,"value":false},{"id":5,"value":"2056-04-05T00:00:00Z"}]},{"eId":38344,"bId":29802,"fname":"Adon","cId":21,"cName":"Regional","vType":"None","totalMinutes":590,"minutesExcludingViolations":590,"sId":15,"snme":"CD","customFields":[{"id":3,"value":false},{"id":4,"value":false}]}]}'
json_data = json.loads(data)
dataframes = list()
for record in json_data['data']:
out = pd.DataFrame(flatten_json(record), index=[0])
dataframes.append(out)
frame = pd.concat(dataframes)
print(frame)
However I cant help but feel this might be overly complicated for what I am trying to achieve. This script is the result of a few hours research and its the best I can come up with. Does anyone have any pointers/advice to perhaps refine this?
I'm essentially completely flattening the JSON data (under the data record) into a dataframe to later be exported to CSV.
Ideal output:
+-------+-----+----------+----------------+----------------+----------------------+-------+-------+----------------------------+-----+------+--------------+-------+
| bId | cId | cName | customFields_3 | customFields_4 | customFields_5 | eId | fname | minutesExcludingViolations | sId | snme | totalMinutes | vType |
+-------+-----+----------+----------------+----------------+----------------------+-------+-------+----------------------------+-----+------+--------------+-------+
| 29802 | 21 | Regional | FALSE | FALSE | 2056-04-05T00:00:00Z | 38344 | Adon | 590 | 15 | CD | 590 | None |
| 29802 | 21 | Regional | FALSE | FALSE | null | 38344 | Adon | 590 | 15 | CD | 590 | None |
+-------+-----+----------+----------------+----------------+----------------------+-------+-------+----------------------------+-----+------+--------------+-------+
EDIT: Turns out I didn't notice but this solution doesn't work. I've added my idealised output and shortened the input data slightly to make it easier to work with for now.
EDIT2: Possible solution... Gives the right output.
main_frame = pd.DataFrame(json_data['data'])
del main_frame['customFields']
frames = list()
for record in json_data['data']:
out = pd.DataFrame.from_records(record['customFields']).T
out = out.reset_index(drop=True)
out.columns = out.iloc[0]
out = out.reindex(out.index.drop(0))
frames.append(out)
custom_fields_frame = pd.concat(frames).reset_index(drop=True)
main_frame = main_frame.join(custom_fields_frame)
print(main_frame)
Thanks,