Python Pandas Nested JSON to Dataframe

Question

I am attempting to convert nested JSON data to a flat table:

(I have edited this as I thought I had a working solution and was asking for advice on optimisation, turns out I don't have it working...)

import pandas as pd
import json
from collections import OrderedDict


# https://stackoverflow.com/questions/36720940/parsing-nested-json-into-dataframe
def flatten_json(json_object, container=None, name=''):
    if container is None:
        container = OrderedDict()
    if isinstance(json_object, dict):
        for key in json_object:
            flatten_json(json_object[key], container=container, name=name + key + '_')
    elif isinstance(json_object, list):
        for n, item in enumerate(json_object, 1):
            flatten_json(item, container=container, name=name + str(n) + '_')
    else:
        container[str(name[:-1])] = str(json_object)
    return container


data = '{"page":1,"pages":2,"totaItems":22,"data":[{"eId":38344,"bId":29802,"fname":"Adon","cId":21,"cName":"Regional","vType":"None","totalMinutes":590,"minutesExcludingViolations":590,"sId":15,"snme":"CD","customFields":[{"id":3,"value":false},{"id":4,"value":false},{"id":5,"value":"2056-04-05T00:00:00Z"}]},{"eId":38344,"bId":29802,"fname":"Adon","cId":21,"cName":"Regional","vType":"None","totalMinutes":590,"minutesExcludingViolations":590,"sId":15,"snme":"CD","customFields":[{"id":3,"value":false},{"id":4,"value":false}]}]}'

json_data = json.loads(data)

dataframes = list()

for record in json_data['data']:
    out = pd.DataFrame(flatten_json(record), index=[0])
    dataframes.append(out)

frame = pd.concat(dataframes)

print(frame)

However I cant help but feel this might be overly complicated for what I am trying to achieve. This script is the result of a few hours research and its the best I can come up with. Does anyone have any pointers/advice to perhaps refine this?

I'm essentially completely flattening the JSON data (under the data record) into a dataframe to later be exported to CSV.

Ideal output:

+-------+-----+----------+----------------+----------------+----------------------+-------+-------+----------------------------+-----+------+--------------+-------+
|  bId  | cId |  cName   | customFields_3 | customFields_4 |    customFields_5    |  eId  | fname | minutesExcludingViolations | sId | snme | totalMinutes | vType |
+-------+-----+----------+----------------+----------------+----------------------+-------+-------+----------------------------+-----+------+--------------+-------+
| 29802 |  21 | Regional | FALSE          | FALSE          | 2056-04-05T00:00:00Z | 38344 | Adon  |                        590 |  15 | CD   |          590 | None  |
| 29802 |  21 | Regional | FALSE          | FALSE          | null                 | 38344 | Adon  |                        590 |  15 | CD   |          590 | None  |
+-------+-----+----------+----------------+----------------+----------------------+-------+-------+----------------------------+-----+------+--------------+-------+

EDIT: Turns out I didn't notice but this solution doesn't work. I've added my idealised output and shortened the input data slightly to make it easier to work with for now.

EDIT2: Possible solution... Gives the right output.

main_frame = pd.DataFrame(json_data['data'])
del main_frame['customFields']

frames = list()
for record in json_data['data']:
    out = pd.DataFrame.from_records(record['customFields']).T
    out = out.reset_index(drop=True)
    out.columns = out.iloc[0]
    out = out.reindex(out.index.drop(0))
    frames.append(out)

custom_fields_frame = pd.concat(frames).reset_index(drop=True)

main_frame = main_frame.join(custom_fields_frame)

print(main_frame)

Thanks,

experiment · Accepted Answer · 2018-07-18 10:26:54Z

0

This solution would do the job efficiently! Converting the nested json to dataframe

nested_json=[{"page":1,"pages":2,"totaItems":22,"data":[{"eId":38344,"bId":29802,"fname":"Adon","cId":21,"cName":"Regional","vType":"None","totalMinutes":590,"minutesExcludingViolations":590,"sId":15,"snme":"CD","customFields":[{"id":3,"value":"false"},{"id":4,"value":"false"},{"id":5,"value":"true"},{"id":6,"value":"false"},{"id":7,"value":"false"},{"id":14,"value":"2056-04-05T00:00:00Z"},{"id":15,"value":"Tester"}]},{"eId":38344,"bId":29802,"fname":"Adon","cId":21,"cName":"Regional","vType":"None","totalMinutes":590,"minutesExcludingViolations":590,"sId":15,"snme":"CD","customFields":[{"id":3,"value":"false"},{"id":5,"value":"true"},{"id":6,"value":"false"},{"id":7,"value":"false"},{"id":14,"value":"2056-04-05T00:00:00Z"},{"id":15,"value":"Tester"},{"id":16,"value":"false"},{"id":17,"value":"false"}]}]}]


from pandas.io.json import json_normalize

json_df = json_normalize(nested_json)


json_columns = list(json_df.columns.values)

#just picks the column_name instead of something.something.column_name

for w in range(len(json_columns)):
    json_columns[w] = json_columns[w].split('.')[-1].lower()

json_df.columns = json_columns

edited Jul 18, 2018 at 10:26

answered Jul 18, 2018 at 10:07

experiment

3154 silver badges19 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

ck3mp Over a year ago

This doesn't get the output I require. Thank you though. I have added my idealised output to the question.

Collectives™ on Stack Overflow

Python Pandas Nested JSON to Dataframe

1 Answer 1

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related