Normalizing nested JSON object into Pandas dataframe

Question

Background: I am trying to normalize a json file, and save into a pandas dataframe, however I am having issues navigating the json structure and my code isn't working as expected.

Expected dataframe output: Given the following example json file (uses randomized data, but exactly the same format as the real one), this is the output I am trying to produce -

New Entity Group	Entity ID	Adjusted Value (1/31/2022, No Div, USD)	Adjusted TWR (Current Quarter No Div, USD))	Adjusted TWR (YTD, No Div, USD)	Annualized Adjusted TWR (Since Inception, No Div, USD)	Inception Date	Risk Target
Portfolio_1		$260,786	(44.55%)	(44.55%)	(44.55%) *	Apr 7, 2021	N/A
The FW Irrev Family Tr	9552252	$260,786	0.00%	0.00%	0.00% *	Jan 11, 2022	N/A
Portfolio_2		$18,396,664	(5.78%)	(5.78%)	(5.47%) *	Sep 3, 2021	Growth
FW DAF	10946585	$18,396,664	(5.78%)	(5.78%)	(5.47%) *	Sep 3, 2021	Growth
Portfolio_3		$60,143,818	(4.42%)	(4.42%)	7.75% *	Dec 17, 2020	-
The FW Family Trust	13014080	$475,356	(6.10%)	(6.10%)	(3.97%) *	Apr 9, 2021	Aggressive
FW Liquid Fund LP	13396796	$52,899,527	(4.15%)	(4.15%)	(4.15%) *	Dec 30, 2021	Aggressive
FW Holdings No. 2 LLC	8413655	$6,768,937	(0.77%)	(0.77%)	11.84% *	Mar 5, 2021	N/A
FW and FR Joint	9957007	($1)	-	-	- *	Dec 21, 2021	N/A

Actual dataframe output: despite my best efforts, I have only been able to get bolded rows to map into the dataframe:

New Entity Group	Adjusted Value (1/31/2022, No Div, USD)	Adjusted TWR (Current Quarter No Div, USD))	Adjusted TWR (YTD, No Div, USD)	Annualized Adjusted TWR (Since Inception, No Div, USD)	Inception Date	Risk Target
Portfolio_1	$260,786	(44.55%)	(44.55%)	(44.55%) *	Apr 7, 2021	N/A
Portfolio_2	$18,396,664	(5.78%)	(5.78%)	(5.47%) *	Sep 3, 2021	Growth
Portfolio_3	$60,143,818	(4.42%)	(4.42%)	7.75% *	Dec 17, 2020	-

JSON file: this is the file I am trying to normalize and map into a dataframe:

{
    "meta": {
        "columns": [
            {
                "key": "node_id",
                "display_name": "Entity ID",
                "output_type": "Word"
            },
            {
                "key": "value",
                "display_name": "Adjusted Value (1/31/2022, No Div, USD)",
                "output_type": "Number",
                "currency": "USD"
            },
            {
                "key": "time_weighted_return",
                "display_name": "Adjusted TWR (Current Quarter, No Div, USD)",
                "output_type": "Percent",
                "currency": "USD"
            },
            {
                "key": "time_weighted_return_2",
                "display_name": "Adjusted TWR (YTD, No Div, USD)",
                "output_type": "Percent",
                "currency": "USD"
            },
            {
                "key": "time_weighted_return_3",
                "display_name": "Annualized Adjusted TWR (Since Inception, No Div, USD)",
                "output_type": "Percent",
                "currency": "USD"
            },
            {
                "key": "inception_event_date",
                "display_name": "Inception Date",
                "output_type": "Date"
            },
            {
                "key": "_custom_portfolio_target_347209",
                "display_name": "Risk Target",
                "output_type": "Word"
            }
        ],
        "groupings": [
            {
                "key": "_custom_new_entity_group_453577",
                "display_name": "NEW Entity Group"
            },
            {
                "key": "top_level_legal_entity",
                "display_name": "Top Level Legal Entity"
            }
        ]
    },
    "data": {
        "type": "portfolio_views",
        "attributes": {
            "total": {
                "name": "Total",
                "columns": {
                    "time_weighted_return": -0.05001974888806926,
                    "inception_event_date": "2020-12-17",
                    "_custom_portfolio_target_347209": null,
                    "time_weighted_return_3": 0.0678647066340392,
                    "time_weighted_return_2": -0.05001974888806926,
                    "value": 7.880126780581851E7,
                    "node_id": null
                },
                "children": [
                    {
                        "name": "Portfolio_3",
                        "grouping": "_custom_new_entity_group_453577",
                        "columns": {
                            "time_weighted_return": -0.04420061615233983,
                            "inception_event_date": "2020-12-17",
                            "_custom_portfolio_target_347209": null,
                            "time_weighted_return_3": 0.07748325432684622,
                            "time_weighted_return_2": -0.04420061615233983,
                            "value": 6.014381761929752E7,
                            "node_id": null
                        },
                        "children": [
                            {
                                "entity_id": 9957007,
                                "name": "FW and FR Joint",
                                "grouping": "top_level_legal_entity",
                                "columns": {
                                    "time_weighted_return": null,
                                    "inception_event_date": "2021-12-21",
                                    "_custom_portfolio_target_347209": "N/A",
                                    "time_weighted_return_3": null,
                                    "time_weighted_return_2": null,
                                    "value": -1.44,
                                    "node_id": "9957007"
                                },
                                "children": []
                            },
                            {
                                "entity_id": 8413655,
                                "name": "FW Holdings No. 2 LLC",
                                "grouping": "top_level_legal_entity",
                                "columns": {
                                    "time_weighted_return": -0.0077309266066708515,
                                    "inception_event_date": "2021-03-05",
                                    "_custom_portfolio_target_347209": "N/A",
                                    "time_weighted_return_3": 0.11844843557716445,
                                    "time_weighted_return_2": -0.0077309266066708515,
                                    "value": 6768936.74,
                                    "node_id": "8413655"
                                },
                                "children": []
                            },
                            {
                                "entity_id": 13396796,
                                "name": "FW Liquid Fund LP",
                                "grouping": "top_level_legal_entity",
                                "columns": {
                                    "time_weighted_return": -0.04149769229150746,
                                    "inception_event_date": "2021-12-30",
                                    "_custom_portfolio_target_347209": "Aggressive",
                                    "time_weighted_return_3": -0.041497430478377395,
                                    "time_weighted_return_2": -0.04149769229150746,
                                    "value": 5.289952672686747E7,
                                    "node_id": "13396796"
                                },
                                "children": []
                            },
                            {
                                "entity_id": 13014080,
                                "name": "The FW Family Trust",
                                "grouping": "top_level_legal_entity",
                                "columns": {
                                    "time_weighted_return": -0.06102013456998856,
                                    "inception_event_date": "2021-04-09",
                                    "_custom_portfolio_target_347209": "Aggressive",
                                    "time_weighted_return_3": -0.039685671858585514,
                                    "time_weighted_return_2": -0.06102013456998856,
                                    "value": 475355.59242999996,
                                    "node_id": "13014080"
                                },
                                "children": []
                            }
                        ]
                    },
                    {
                        "name": "Portfolio_1",
                        "grouping": "_custom_new_entity_group_453577",
                        "columns": {
                            "time_weighted_return": -0.44554958179309,
                            "inception_event_date": "2021-04-07",
                            "_custom_portfolio_target_347209": "N/A",
                            "time_weighted_return_3": -0.44554958179309,
                            "time_weighted_return_2": -0.44554958179309,
                            "value": 260786.03,
                            "node_id": null
                        },
                        "children": [
                            {
                                "entity_id": 9552252,
                                "name": "The FW Irrev Family Tr",
                                "grouping": "top_level_legal_entity",
                                "columns": {
                                    "time_weighted_return": 0.0,
                                    "inception_event_date": "2022-01-11",
                                    "_custom_portfolio_target_347209": "N/A",
                                    "time_weighted_return_3": 0.0,
                                    "time_weighted_return_2": 0.0,
                                    "value": 260786.03,
                                    "node_id": "9552252"
                                },
                                "children": []
                            }
                        ]
                    },
                    {
                        "name": "Portfolio_2",
                        "grouping": "_custom_new_entity_group_453577",
                        "columns": {
                            "time_weighted_return": -0.05780354507057972,
                            "inception_event_date": "2021-09-03",
                            "_custom_portfolio_target_347209": "Growth",
                            "time_weighted_return_3": -0.05470214863844658,
                            "time_weighted_return_2": -0.05780354507057972,
                            "value": 1.8396664156520825E7,
                            "node_id": null
                        },
                        "children": [
                            {
                                "entity_id": 10946585,
                                "name": "FW DAF",
                                "grouping": "top_level_legal_entity",
                                "columns": {
                                    "time_weighted_return": -0.05780354507057972,
                                    "inception_event_date": "2021-09-03",
                                    "_custom_portfolio_target_347209": "Growth",
                                    "time_weighted_return_3": -0.05470214863844658,
                                    "time_weighted_return_2": -0.05780354507057972,
                                    "value": 1.8396664156520832E7,
                                    "node_id": "10946585"
                                },
                                "children": []
                            }
                        ]
                    }
                ]
            }
        }
    },
    "included": []
}

My code: this is the function, which I built to try and normalize the JSON response and save in a pandas dataframe -

def unpack_response():
    while True:
        try:    
            api_response = response_writer()
            df = pd.json_normalize(api_response['data']['attributes']['total']['children'])
            df.columns = df.columns.str.replace(r'columns.', '', regex=False)
            column_name_mapper = {column['key']: column['display_name'] for column in api_response['meta']['columns']}
            df.rename(columns=column_name_mapper, inplace=True)
            break
        except KeyError:
            print("-----------------------------------\n","API TIMEOUT ERROR: TRYING AGAIN...", "\n-----------------------------------\n")
    
    df.rename(columns={'name': 'New Entity Group'}, inplace=True)

    column_names = ["New Entity Group", "Entity ID", "Adjusted Value (1/31/2022, No Div, USD)", "Adjusted TWR (Current Quarter, No Div, USD)", "Adjusted TWR (YTD, No Div, USD)", "Annualized Adjusted TWR (Since Inception, No Div, USD)", "Inception Date"]
    df = df.reindex(columns=column_names)
    
    return df
unpack_response()

Comment about my code:

Portfolio_1, Portfolio_2, Portfolio_3 - these bolded rows are first level of children of data and seem to be the only rows which are saving to the df. I think this is because my code references df = pd.json_normalize(api_response['data']['attributes']['total']['children']) so is only looking at these lists. I tried just appending ['children']['children'] to the end of that code snippet (given there are 3x level of children, but received a TypeError: list indices must be integers or slices, not str.

I would be grateful for any suggestions on how I can improve or add to my function, so I can tap into the key:pair values, which are the 2x lower of the children levels.

If I may ask, are you waiting for more answers? Does any particular answer seem better to you? — user17242583
– user17242583, Commented Feb 6, 2022 at 15:46
I apologize to all those who took incredible effort to contribute. I’ve spent the weekend reviewing and understanding all the code, before rewarding the bounty. I will do this shortly. Thank you everyone for being so generous with your answers and clear explanations. — William
– William, Commented Feb 7, 2022 at 14:38

ti7 · Accepted Answer · 2022-02-03 16:48:52Z

2

This looks like you are trying to create and then stack three dataframes, which you may not really want to do or may be better achieved by mapping each Porfolio_ to every relevant line and then either

import itertools
...
portfolio_views_children = response['data']['attributes']['total']['children']

portfolios = []
for portfolio in portfolio_views_children:
    entity_columns = []
    # include portfolio itself within an iterable so the total is the header
    for entity in itertools.chain([portfolio], portfolio["children"]):
        entity_data = entity["columns"].copy()  # don't mutate original response
        entity_data["portfolio"] = portfolio["name"]   # from outer
        entity_data["name"]      = entity["name"]
        entity_columns.append(entity_data)

    df = pd.DataFrame(entity_columns)
    portfolios.append(df)

# combine dataframes
df = pd.concat(portfolios)
# reorder and rename
column_ordering = {"portfolio": "portfolio", "name": "name"}
column_ordering.update({c["key"]: c["display_name"] for c in response["meta"]["columns"]})
df = df[column_ordering.keys()]   # beware: un-named cols will be dropped
df = df.rename(columns=column_ordering)

print(df.to_markdown(index=False))  # create output below (requires tabulate)

portfolio	name	Entity ID	Adjusted Value (1/31/2022, No Div, USD)	Adjusted TWR (Current Quarter, No Div, USD)	Adjusted TWR (YTD, No Div, USD)	Annualized Adjusted TWR (Since Inception, No Div, USD)	Inception Date	Risk Target
Portfolio_3	Portfolio_3		6.01438e+07	-0.0442006	-0.0442006	0.0774833	2020-12-17
Portfolio_3	FW and FR Joint	9957007	-1.44	nan	nan	nan	2021-12-21	N/A
Portfolio_3	FW Holdings No. 2 LLC	8413655	6.76894e+06	-0.00773093	-0.00773093	0.118448	2021-03-05	N/A
Portfolio_3	FW Liquid Fund LP	13396796	5.28995e+07	-0.0414977	-0.0414977	-0.0414974	2021-12-30	Aggressive
Portfolio_3	The FW Family Trust	13014080	475356	-0.0610201	-0.0610201	-0.0396857	2021-04-09	Aggressive
Portfolio_1	Portfolio_1		260786	-0.44555	-0.44555	-0.44555	2021-04-07	N/A
Portfolio_1	The FW Irrev Family Tr	9552252	260786	0	0	0	2022-01-11	N/A
Portfolio_2	Portfolio_2		1.83967e+07	-0.0578035	-0.0578035	-0.0547021	2021-09-03	Growth
Portfolio_2	FW DAF	10946585	1.83967e+07	-0.0578035	-0.0578035	-0.0547021	2021-09-03	Growth

answered Feb 3, 2022 at 16:48

ti7

19.8k8 gold badges50 silver badges87 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

William Over a year ago

Please accept my apologies for the delay thanking you for this contribution. My only question is - how would you expand this code to accommodate another hierarchy of children, all containing exactly the same data as the level above?

ti7 Over a year ago

@William you should be able to continue adding them the for loop over itertools.chain - additionally, if you find you have a great or unknown depth of the same structure, you may find walking the structure recursively, calling isinstance and iterating for (list, dict) and checking tree nodes works very well (adding blocks that match to a collection to then iterate over)

William Over a year ago

thank you. I tried for entity in itertools.chain([portfolio], portfolio["children"],["children"]): however just got a 'TypeError: string indices must be integers'. Which is referencing entity_data = entity["columns"].copy() Did I approach this wrong?

William Over a year ago

To give additional context - I am able to access all levels by doing a simple

df = json_normalize(api_response['data']['attributes']['total']['children'], record_path = ['children', 'children', 'children'])

@ti7

Emma · Accepted Answer · 2022-02-04 15:02:25Z

2

Since your children's children has same structure as children, you can try using json_normalize twice separately and append it together.

# For first layer that includes Portfolio_1, Portfolio_2, Portfolio_3
df = pd.json_normalize(s, record_path=['data', 'attributes', 'total', 'children'])

# For second layer that includes The FW Irrev Family Tr, etc
# Use explode to convert list into rows
df_child = pd.json_normalize(df.explode('children').children)

# Combine both
df = pd.concat([df, df_child])

# You can use your column renaming and filtering

edited Feb 4, 2022 at 15:02

answered Feb 3, 2022 at 16:25

Emma

9,5781 gold badge22 silver badges38 bronze badges

Comments

quasi-human · Accepted Answer · 2022-02-05 07:30:21Z

I prefer to use json_normalize. The following code doesn't deal with error handling, detailed formatting, etc, whereas I think the essence of what you want to do the most is included.

Code:

import json
import pandas as pd

# You have to change this path according to the actual json file location.
with open('./api_response.json', 'r') as f:
    api_response = json.load(f)

def unpack_response(r):
    df = pd.DataFrame()

    df_src = pd.json_normalize(r, record_path=['data', 'attributes', 'total', 'children'])
    for _, row in df_src.sort_values('name').iterrows(): 
        df_p = pd.DataFrame(row).T
        df_c = pd.json_normalize(row.children)

        # I'm not sure what your expected sorting order is. Perhaps you might want to delete the next line.
        df_c = df_c.sort_values(['columns._custom_portfolio_target_347209', 'columns.inception_event_date'])

        df = pd.concat([df, df_p, df_c], axis=0, ignore_index=True)

    column_name_mapper = {'columns.' + column['key']: column['display_name'] for column in api_response['meta']['columns']}
    column_name_mapper.update({'name': 'New Entity Group'})
    column_names = ["New Entity Group", "Entity ID", "Adjusted Value (1/31/2022, No Div, USD)", "Adjusted TWR (Current Quarter, No Div, USD)", "Adjusted TWR (YTD, No Div, USD)", "Annualized Adjusted TWR (Since Inception, No Div, USD)", "Inception Date", "Risk Target"]
    df = df.rename(columns=column_name_mapper).reindex(columns=column_names)

    return df

df = unpack_response(api_response)

Output:

New Entity Group	Entity ID	Adjusted Value (1/31/2022, No Div, USD)	Adjusted TWR (Current Quarter, No Div, USD)	Adjusted TWR (YTD, No Div, USD)	Annualized Adjusted TWR (Since Inception, No Div, USD)	Inception Date	Risk Target
Portfolio_1		260786	-0.44555	-0.44555	-0.44555	2021-04-07	N/A
The FW Irrev Family Tr	9552252	260786	0	0	0	2022-01-11	N/A
Portfolio_2		1.83967e+07	-0.0578035	-0.0578035	-0.0547021	2021-09-03	Growth
FW DAF	10946585	1.83967e+07	-0.0578035	-0.0578035	-0.0547021	2021-09-03	Growth
Portfolio_3		6.01438e+07	-0.0442006	-0.0442006	0.0774833	2020-12-17
The FW Family Trust	13014080	475356	-0.0610201	-0.0610201	-0.0396857	2021-04-09	Aggressive
FW Liquid Fund LP	13396796	5.28995e+07	-0.0414977	-0.0414977	-0.0414974	2021-12-30	Aggressive
FW Holdings No. 2 LLC	8413655	6.76894e+06	-0.00773093	-0.00773093	0.118448	2021-03-05	N/A
FW and FR Joint	9957007	-1.44	nan	nan	nan	2021-12-21	N/A

score 1 · Accepted Answer · 2022-02-04 16:49:14Z

Personally, I would not use pd.json_normalize for this case. Your JSON is quite complex, and unless you're really experienced with json_normalize, the following code may take less time to understand for the average dev. In fact, you don't even need to see the JSON to understand exactly what this code does (although it would certainly help ;).

First, we can extract the objects (portfolios and their children) from the JSON into a list, and use a series of steps to get them in the right form and order:

def prep_obj(o):
    """Prepares an object (portfolio/child) from the JSON to be inserted into a dataframe."""
    return {
        'New Entity Group': o['name'],
    } | o['columns']


# Get a list of lists, where each sub-list contains the portfolio object at index 0 and then the portfolio object's children:
groups = [[prep_obj(o), *[prep_obj(child) for child in o['children']]] for o in api_response['data']['attributes']['total']['children']]

# Sort the portfolio groups by their number:
groups.sort(key=lambda g: int(g[0]['New Entity Group'].split('_')[1]))

# Reverse the children of each portfolio group:
groups = [[g[0]] + g[1:][::-1] for g in groups]

# Flatten out the groups into one large list of objects:
objects = [obj for group in groups for obj in group]
# The above is exactly equivalent to the following:
#   objects = []
#   for group in groups:
#       for obj in group:
#           objects.append(obj)

Next, create the dataframe:

# Create a mapping for column names so that their display names can be used:
mapping = {col['key']: col['display_name'] for col in api_response['meta']['columns']}

# Create a dataframe from the list of objects:
df = pd.DataFrame(objects)

# Correct column names:
df = df.rename(mapping, axis=1)
# Reorder columns:
column_names = ["New Entity Group", "Entity ID", "Adjusted Value (1/31/2022, No Div, USD)", "Adjusted TWR (Current Quarter, No Div, USD)", "Adjusted TWR (YTD, No Div, USD)", "Annualized Adjusted TWR (Since Inception, No Div, USD)", "Inception Date", "Risk Target"]
df = df[column_names]

And formatting:

def format_twr_col(col):
    return (
        col
        .abs()
        .mul(100)
        .round(2)
        .pipe(lambda s: s.where(s.eq(0) | s.isna(), '(' + s.astype(str) + '%)'))
        .pipe(lambda s: s.where(s.ne(0) | s.isna(), s.astype(str) + '%'))
        .fillna('-')
    )

def format_value_col(col):
    positive_mask = col.ge(0)

    col[positive_mask] = (
        col[positive_mask]
        .round()
        .astype(int)
        .map('${:,}'.format)
    )

    col[~positive_mask] = (
        col[~positive_mask]
        .astype(float)
        .round()
        .astype(int)
        .abs()
        .map('(${:,})'.format)
    )
    
    return col

df['Adjusted TWR (Current Quarter, No Div, USD)'] = format_twr_col(df['Adjusted TWR (Current Quarter, No Div, USD)'])
df['Annualized Adjusted TWR (Since Inception, No Div, USD)'] = format_twr_col(df['Annualized Adjusted TWR (Since Inception, No Div, USD)'])
df['Adjusted TWR (YTD, No Div, USD)'] = format_twr_col(df['Adjusted TWR (YTD, No Div, USD)'])

df['Adjusted Value (1/31/2022, No Div, USD)'] = format_value_col(df['Adjusted Value (1/31/2022, No Div, USD)'].copy())

df['Inception Date'] = pd.to_datetime(df['Inception Date']).dt.strftime('%b %d, %Y')

df['Entity ID'] = df['Entity ID'].fillna('')

And... voilà:

>>> pd.options.display.max_columns = None
>>> df
         New Entity Group Entity ID Adjusted Value (1/31/2022, No Div, USD)  Adjusted TWR (Current Quarter, No Div, USD) Adjusted TWR (YTD, No Div, USD)  Annualized Adjusted TWR (Since Inception, No Div, USD) Inception Date  Risk Target
0             Portfolio_1                                          $260,786                                     (44.55%)                        (44.55%)                                            (44.55%)       Apr 07, 2021          N/A
1  The FW Irrev Family Tr   9552252                                $260,786                                         0.0%                            0.0%                                                0.0%       Jan 11, 2022          N/A
2             Portfolio_2                                       $18,396,664                                      (5.78%)                         (5.78%)                                             (5.47%)       Sep 03, 2021       Growth
3                  FW DAF  10946585                             $18,396,664                                      (5.78%)                         (5.78%)                                             (5.47%)       Sep 03, 2021       Growth
4             Portfolio_3                                       $60,143,818                                      (4.42%)                         (4.42%)                                             (7.75%)       Dec 17, 2020          NaN
5     The FW Family Trust  13014080                                $475,356                                       (6.1%)                          (6.1%)                                             (3.97%)       Apr 09, 2021   Aggressive
6       FW Liquid Fund LP  13396796                             $52,899,527                                      (4.15%)                         (4.15%)                                             (4.15%)       Dec 30, 2021   Aggressive
7   FW Holdings No. 2 LLC   8413655                              $6,768,937                                      (0.77%)                         (0.77%)                                            (11.84%)       Mar 05, 2021          N/A
8         FW and FR Joint   9957007                                    ($1)                                            -                               -                                                   -       Dec 21, 2021          N/A

I appreciate there is no obligation to reply, however how would I go a level deeper i.e., children contained children, that contained children. I tried something api_response['data']['attributes']['total']['children']['children] but as you can imagine, it didn't go well!
Hi @William, it's really late here, I'll try to get to you tomorrow. Unfortunately I'm off on a trip tomorrow for a week, so I may not get back until then :/

Collectives™ on Stack Overflow

Normalizing nested JSON object into Pandas dataframe

4 Answers 4

4 Comments

Comments

Code:

Output:

Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

4 Comments

Comments

Code:

Output:

Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related