0

I am attempting to convert nested JSON data to a flat table:

(I have edited this as I thought I had a working solution and was asking for advice on optimisation, turns out I don't have it working...)

import pandas as pd
import json
from collections import OrderedDict


# https://stackoverflow.com/questions/36720940/parsing-nested-json-into-dataframe
def flatten_json(json_object, container=None, name=''):
    if container is None:
        container = OrderedDict()
    if isinstance(json_object, dict):
        for key in json_object:
            flatten_json(json_object[key], container=container, name=name + key + '_')
    elif isinstance(json_object, list):
        for n, item in enumerate(json_object, 1):
            flatten_json(item, container=container, name=name + str(n) + '_')
    else:
        container[str(name[:-1])] = str(json_object)
    return container


data = '{"page":1,"pages":2,"totaItems":22,"data":[{"eId":38344,"bId":29802,"fname":"Adon","cId":21,"cName":"Regional","vType":"None","totalMinutes":590,"minutesExcludingViolations":590,"sId":15,"snme":"CD","customFields":[{"id":3,"value":false},{"id":4,"value":false},{"id":5,"value":"2056-04-05T00:00:00Z"}]},{"eId":38344,"bId":29802,"fname":"Adon","cId":21,"cName":"Regional","vType":"None","totalMinutes":590,"minutesExcludingViolations":590,"sId":15,"snme":"CD","customFields":[{"id":3,"value":false},{"id":4,"value":false}]}]}'

json_data = json.loads(data)

dataframes = list()

for record in json_data['data']:
    out = pd.DataFrame(flatten_json(record), index=[0])
    dataframes.append(out)

frame = pd.concat(dataframes)

print(frame)

However I cant help but feel this might be overly complicated for what I am trying to achieve. This script is the result of a few hours research and its the best I can come up with. Does anyone have any pointers/advice to perhaps refine this?

I'm essentially completely flattening the JSON data (under the data record) into a dataframe to later be exported to CSV.

Ideal output:

+-------+-----+----------+----------------+----------------+----------------------+-------+-------+----------------------------+-----+------+--------------+-------+
|  bId  | cId |  cName   | customFields_3 | customFields_4 |    customFields_5    |  eId  | fname | minutesExcludingViolations | sId | snme | totalMinutes | vType |
+-------+-----+----------+----------------+----------------+----------------------+-------+-------+----------------------------+-----+------+--------------+-------+
| 29802 |  21 | Regional | FALSE          | FALSE          | 2056-04-05T00:00:00Z | 38344 | Adon  |                        590 |  15 | CD   |          590 | None  |
| 29802 |  21 | Regional | FALSE          | FALSE          | null                 | 38344 | Adon  |                        590 |  15 | CD   |          590 | None  |
+-------+-----+----------+----------------+----------------+----------------------+-------+-------+----------------------------+-----+------+--------------+-------+

EDIT: Turns out I didn't notice but this solution doesn't work. I've added my idealised output and shortened the input data slightly to make it easier to work with for now.

EDIT2: Possible solution... Gives the right output.

main_frame = pd.DataFrame(json_data['data'])
del main_frame['customFields']

frames = list()
for record in json_data['data']:
    out = pd.DataFrame.from_records(record['customFields']).T
    out = out.reset_index(drop=True)
    out.columns = out.iloc[0]
    out = out.reindex(out.index.drop(0))
    frames.append(out)

custom_fields_frame = pd.concat(frames).reset_index(drop=True)

main_frame = main_frame.join(custom_fields_frame)

print(main_frame)

Thanks,

1 Answer 1

0

This solution would do the job efficiently! Converting the nested json to dataframe

nested_json=[{"page":1,"pages":2,"totaItems":22,"data":[{"eId":38344,"bId":29802,"fname":"Adon","cId":21,"cName":"Regional","vType":"None","totalMinutes":590,"minutesExcludingViolations":590,"sId":15,"snme":"CD","customFields":[{"id":3,"value":"false"},{"id":4,"value":"false"},{"id":5,"value":"true"},{"id":6,"value":"false"},{"id":7,"value":"false"},{"id":14,"value":"2056-04-05T00:00:00Z"},{"id":15,"value":"Tester"}]},{"eId":38344,"bId":29802,"fname":"Adon","cId":21,"cName":"Regional","vType":"None","totalMinutes":590,"minutesExcludingViolations":590,"sId":15,"snme":"CD","customFields":[{"id":3,"value":"false"},{"id":5,"value":"true"},{"id":6,"value":"false"},{"id":7,"value":"false"},{"id":14,"value":"2056-04-05T00:00:00Z"},{"id":15,"value":"Tester"},{"id":16,"value":"false"},{"id":17,"value":"false"}]}]}]


from pandas.io.json import json_normalize

json_df = json_normalize(nested_json)


json_columns = list(json_df.columns.values)

#just picks the column_name instead of something.something.column_name

for w in range(len(json_columns)):
    json_columns[w] = json_columns[w].split('.')[-1].lower()

json_df.columns = json_columns
Sign up to request clarification or add additional context in comments.

1 Comment

This doesn't get the output I require. Thank you though. I have added my idealised output to the question.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.