6

Background: I am trying to normalize a json file, and save into a pandas dataframe, however I am having issues navigating the json structure and my code isn't working as expected.

Expected dataframe output: Given the following example json file (uses randomized data, but exactly the same format as the real one), this is the output I am trying to produce -

New Entity Group Entity ID Adjusted Value
(1/31/2022, No Div, USD)
Adjusted TWR
(Current Quarter No Div, USD))
Adjusted TWR
(YTD, No Div, USD)
Annualized Adjusted TWR
(Since Inception, No Div, USD)
Inception Date Risk Target
Portfolio_1 $260,786 (44.55%) (44.55%) (44.55%) * Apr 7, 2021 N/A
The FW Irrev Family Tr 9552252 $260,786 0.00% 0.00% 0.00% * Jan 11, 2022 N/A
Portfolio_2 $18,396,664 (5.78%) (5.78%) (5.47%) * Sep 3, 2021 Growth
FW DAF 10946585 $18,396,664 (5.78%) (5.78%) (5.47%) * Sep 3, 2021 Growth
Portfolio_3 $60,143,818 (4.42%) (4.42%) 7.75% * Dec 17, 2020 -
The FW Family Trust 13014080 $475,356 (6.10%) (6.10%) (3.97%) * Apr 9, 2021 Aggressive
FW Liquid Fund LP 13396796 $52,899,527 (4.15%) (4.15%) (4.15%) * Dec 30, 2021 Aggressive
FW Holdings No. 2 LLC 8413655 $6,768,937 (0.77%) (0.77%) 11.84% * Mar 5, 2021 N/A
FW and FR Joint 9957007 ($1) - - - * Dec 21, 2021 N/A

Actual dataframe output: despite my best efforts, I have only been able to get bolded rows to map into the dataframe:

New Entity Group Entity ID Adjusted Value
(1/31/2022, No Div, USD)
Adjusted TWR
(Current Quarter No Div, USD))
Adjusted TWR
(YTD, No Div, USD)
Annualized Adjusted TWR
(Since Inception, No Div, USD)
Inception Date Risk Target
Portfolio_1 $260,786 (44.55%) (44.55%) (44.55%) * Apr 7, 2021 N/A
Portfolio_2 $18,396,664 (5.78%) (5.78%) (5.47%) * Sep 3, 2021 Growth
Portfolio_3 $60,143,818 (4.42%) (4.42%) 7.75% * Dec 17, 2020 -

JSON file: this is the file I am trying to normalize and map into a dataframe:

{
    "meta": {
        "columns": [
            {
                "key": "node_id",
                "display_name": "Entity ID",
                "output_type": "Word"
            },
            {
                "key": "value",
                "display_name": "Adjusted Value (1/31/2022, No Div, USD)",
                "output_type": "Number",
                "currency": "USD"
            },
            {
                "key": "time_weighted_return",
                "display_name": "Adjusted TWR (Current Quarter, No Div, USD)",
                "output_type": "Percent",
                "currency": "USD"
            },
            {
                "key": "time_weighted_return_2",
                "display_name": "Adjusted TWR (YTD, No Div, USD)",
                "output_type": "Percent",
                "currency": "USD"
            },
            {
                "key": "time_weighted_return_3",
                "display_name": "Annualized Adjusted TWR (Since Inception, No Div, USD)",
                "output_type": "Percent",
                "currency": "USD"
            },
            {
                "key": "inception_event_date",
                "display_name": "Inception Date",
                "output_type": "Date"
            },
            {
                "key": "_custom_portfolio_target_347209",
                "display_name": "Risk Target",
                "output_type": "Word"
            }
        ],
        "groupings": [
            {
                "key": "_custom_new_entity_group_453577",
                "display_name": "NEW Entity Group"
            },
            {
                "key": "top_level_legal_entity",
                "display_name": "Top Level Legal Entity"
            }
        ]
    },
    "data": {
        "type": "portfolio_views",
        "attributes": {
            "total": {
                "name": "Total",
                "columns": {
                    "time_weighted_return": -0.05001974888806926,
                    "inception_event_date": "2020-12-17",
                    "_custom_portfolio_target_347209": null,
                    "time_weighted_return_3": 0.0678647066340392,
                    "time_weighted_return_2": -0.05001974888806926,
                    "value": 7.880126780581851E7,
                    "node_id": null
                },
                "children": [
                    {
                        "name": "Portfolio_3",
                        "grouping": "_custom_new_entity_group_453577",
                        "columns": {
                            "time_weighted_return": -0.04420061615233983,
                            "inception_event_date": "2020-12-17",
                            "_custom_portfolio_target_347209": null,
                            "time_weighted_return_3": 0.07748325432684622,
                            "time_weighted_return_2": -0.04420061615233983,
                            "value": 6.014381761929752E7,
                            "node_id": null
                        },
                        "children": [
                            {
                                "entity_id": 9957007,
                                "name": "FW and FR Joint",
                                "grouping": "top_level_legal_entity",
                                "columns": {
                                    "time_weighted_return": null,
                                    "inception_event_date": "2021-12-21",
                                    "_custom_portfolio_target_347209": "N/A",
                                    "time_weighted_return_3": null,
                                    "time_weighted_return_2": null,
                                    "value": -1.44,
                                    "node_id": "9957007"
                                },
                                "children": []
                            },
                            {
                                "entity_id": 8413655,
                                "name": "FW Holdings No. 2 LLC",
                                "grouping": "top_level_legal_entity",
                                "columns": {
                                    "time_weighted_return": -0.0077309266066708515,
                                    "inception_event_date": "2021-03-05",
                                    "_custom_portfolio_target_347209": "N/A",
                                    "time_weighted_return_3": 0.11844843557716445,
                                    "time_weighted_return_2": -0.0077309266066708515,
                                    "value": 6768936.74,
                                    "node_id": "8413655"
                                },
                                "children": []
                            },
                            {
                                "entity_id": 13396796,
                                "name": "FW Liquid Fund LP",
                                "grouping": "top_level_legal_entity",
                                "columns": {
                                    "time_weighted_return": -0.04149769229150746,
                                    "inception_event_date": "2021-12-30",
                                    "_custom_portfolio_target_347209": "Aggressive",
                                    "time_weighted_return_3": -0.041497430478377395,
                                    "time_weighted_return_2": -0.04149769229150746,
                                    "value": 5.289952672686747E7,
                                    "node_id": "13396796"
                                },
                                "children": []
                            },
                            {
                                "entity_id": 13014080,
                                "name": "The FW Family Trust",
                                "grouping": "top_level_legal_entity",
                                "columns": {
                                    "time_weighted_return": -0.06102013456998856,
                                    "inception_event_date": "2021-04-09",
                                    "_custom_portfolio_target_347209": "Aggressive",
                                    "time_weighted_return_3": -0.039685671858585514,
                                    "time_weighted_return_2": -0.06102013456998856,
                                    "value": 475355.59242999996,
                                    "node_id": "13014080"
                                },
                                "children": []
                            }
                        ]
                    },
                    {
                        "name": "Portfolio_1",
                        "grouping": "_custom_new_entity_group_453577",
                        "columns": {
                            "time_weighted_return": -0.44554958179309,
                            "inception_event_date": "2021-04-07",
                            "_custom_portfolio_target_347209": "N/A",
                            "time_weighted_return_3": -0.44554958179309,
                            "time_weighted_return_2": -0.44554958179309,
                            "value": 260786.03,
                            "node_id": null
                        },
                        "children": [
                            {
                                "entity_id": 9552252,
                                "name": "The FW Irrev Family Tr",
                                "grouping": "top_level_legal_entity",
                                "columns": {
                                    "time_weighted_return": 0.0,
                                    "inception_event_date": "2022-01-11",
                                    "_custom_portfolio_target_347209": "N/A",
                                    "time_weighted_return_3": 0.0,
                                    "time_weighted_return_2": 0.0,
                                    "value": 260786.03,
                                    "node_id": "9552252"
                                },
                                "children": []
                            }
                        ]
                    },
                    {
                        "name": "Portfolio_2",
                        "grouping": "_custom_new_entity_group_453577",
                        "columns": {
                            "time_weighted_return": -0.05780354507057972,
                            "inception_event_date": "2021-09-03",
                            "_custom_portfolio_target_347209": "Growth",
                            "time_weighted_return_3": -0.05470214863844658,
                            "time_weighted_return_2": -0.05780354507057972,
                            "value": 1.8396664156520825E7,
                            "node_id": null
                        },
                        "children": [
                            {
                                "entity_id": 10946585,
                                "name": "FW DAF",
                                "grouping": "top_level_legal_entity",
                                "columns": {
                                    "time_weighted_return": -0.05780354507057972,
                                    "inception_event_date": "2021-09-03",
                                    "_custom_portfolio_target_347209": "Growth",
                                    "time_weighted_return_3": -0.05470214863844658,
                                    "time_weighted_return_2": -0.05780354507057972,
                                    "value": 1.8396664156520832E7,
                                    "node_id": "10946585"
                                },
                                "children": []
                            }
                        ]
                    }
                ]
            }
        }
    },
    "included": []
}

My code: this is the function, which I built to try and normalize the JSON response and save in a pandas dataframe -

def unpack_response():
    while True:
        try:    
            api_response = response_writer()
            df = pd.json_normalize(api_response['data']['attributes']['total']['children'])
            df.columns = df.columns.str.replace(r'columns.', '', regex=False)
            column_name_mapper = {column['key']: column['display_name'] for column in api_response['meta']['columns']}
            df.rename(columns=column_name_mapper, inplace=True)
            break
        except KeyError:
            print("-----------------------------------\n","API TIMEOUT ERROR: TRYING AGAIN...", "\n-----------------------------------\n")
    
    df.rename(columns={'name': 'New Entity Group'}, inplace=True)

    column_names = ["New Entity Group", "Entity ID", "Adjusted Value (1/31/2022, No Div, USD)", "Adjusted TWR (Current Quarter, No Div, USD)", "Adjusted TWR (YTD, No Div, USD)", "Annualized Adjusted TWR (Since Inception, No Div, USD)", "Inception Date"]
    df = df.reindex(columns=column_names)
    
    return df
unpack_response()

Comment about my code:

  • Portfolio_1, Portfolio_2, Portfolio_3 - these bolded rows are first level of children of data and seem to be the only rows which are saving to the df. I think this is because my code references df = pd.json_normalize(api_response['data']['attributes']['total']['children']) so is only looking at these lists. I tried just appending ['children']['children'] to the end of that code snippet (given there are 3x level of children, but received a TypeError: list indices must be integers or slices, not str.

I would be grateful for any suggestions on how I can improve or add to my function, so I can tap into the key:pair values, which are the 2x lower of the children levels.

2
  • If I may ask, are you waiting for more answers? Does any particular answer seem better to you? Commented Feb 6, 2022 at 15:46
  • 1
    I apologize to all those who took incredible effort to contribute. I’ve spent the weekend reviewing and understanding all the code, before rewarding the bounty. I will do this shortly. Thank you everyone for being so generous with your answers and clear explanations. Commented Feb 7, 2022 at 14:38

4 Answers 4

2

This looks like you are trying to create and then stack three dataframes, which you may not really want to do or may be better achieved by mapping each Porfolio_ to every relevant line and then either

import itertools
...
portfolio_views_children = response['data']['attributes']['total']['children']

portfolios = []
for portfolio in portfolio_views_children:
    entity_columns = []
    # include portfolio itself within an iterable so the total is the header
    for entity in itertools.chain([portfolio], portfolio["children"]):
        entity_data = entity["columns"].copy()  # don't mutate original response
        entity_data["portfolio"] = portfolio["name"]   # from outer
        entity_data["name"]      = entity["name"]
        entity_columns.append(entity_data)

    df = pd.DataFrame(entity_columns)
    portfolios.append(df)

# combine dataframes
df = pd.concat(portfolios)
# reorder and rename
column_ordering = {"portfolio": "portfolio", "name": "name"}
column_ordering.update({c["key"]: c["display_name"] for c in response["meta"]["columns"]})
df = df[column_ordering.keys()]   # beware: un-named cols will be dropped
df = df.rename(columns=column_ordering)

print(df.to_markdown(index=False))  # create output below (requires tabulate)
portfolio name Entity ID Adjusted Value (1/31/2022, No Div, USD) Adjusted TWR (Current Quarter, No Div, USD) Adjusted TWR (YTD, No Div, USD) Annualized Adjusted TWR (Since Inception, No Div, USD) Inception Date Risk Target
Portfolio_3 Portfolio_3 6.01438e+07 -0.0442006 -0.0442006 0.0774833 2020-12-17
Portfolio_3 FW and FR Joint 9957007 -1.44 nan nan nan 2021-12-21 N/A
Portfolio_3 FW Holdings No. 2 LLC 8413655 6.76894e+06 -0.00773093 -0.00773093 0.118448 2021-03-05 N/A
Portfolio_3 FW Liquid Fund LP 13396796 5.28995e+07 -0.0414977 -0.0414977 -0.0414974 2021-12-30 Aggressive
Portfolio_3 The FW Family Trust 13014080 475356 -0.0610201 -0.0610201 -0.0396857 2021-04-09 Aggressive
Portfolio_1 Portfolio_1 260786 -0.44555 -0.44555 -0.44555 2021-04-07 N/A
Portfolio_1 The FW Irrev Family Tr 9552252 260786 0 0 0 2022-01-11 N/A
Portfolio_2 Portfolio_2 1.83967e+07 -0.0578035 -0.0578035 -0.0547021 2021-09-03 Growth
Portfolio_2 FW DAF 10946585 1.83967e+07 -0.0578035 -0.0578035 -0.0547021 2021-09-03 Growth
Sign up to request clarification or add additional context in comments.

4 Comments

Please accept my apologies for the delay thanking you for this contribution. My only question is - how would you expand this code to accommodate another hierarchy of children, all containing exactly the same data as the level above?
@William you should be able to continue adding them the for loop over itertools.chain - additionally, if you find you have a great or unknown depth of the same structure, you may find walking the structure recursively, calling isinstance and iterating for (list, dict) and checking tree nodes works very well (adding blocks that match to a collection to then iterate over)
thank you. I tried for entity in itertools.chain([portfolio], portfolio["children"],["children"]): however just got a 'TypeError: string indices must be integers'. Which is referencing entity_data = entity["columns"].copy() Did I approach this wrong?
To give additional context - I am able to access all levels by doing a simple df = json_normalize(api_response['data']['attributes']['total']['children'], record_path = ['children', 'children', 'children']) @ti7
2

Since your children's children has same structure as children, you can try using json_normalize twice separately and append it together.

# For first layer that includes Portfolio_1, Portfolio_2, Portfolio_3
df = pd.json_normalize(s, record_path=['data', 'attributes', 'total', 'children'])

# For second layer that includes The FW Irrev Family Tr, etc
# Use explode to convert list into rows
df_child = pd.json_normalize(df.explode('children').children)

# Combine both
df = pd.concat([df, df_child])

# You can use your column renaming and filtering 

Comments

2

I prefer to use json_normalize. The following code doesn't deal with error handling, detailed formatting, etc, whereas I think the essence of what you want to do the most is included.

Code:

import json
import pandas as pd

# You have to change this path according to the actual json file location.
with open('./api_response.json', 'r') as f:
    api_response = json.load(f)

def unpack_response(r):
    df = pd.DataFrame()

    df_src = pd.json_normalize(r, record_path=['data', 'attributes', 'total', 'children'])
    for _, row in df_src.sort_values('name').iterrows(): 
        df_p = pd.DataFrame(row).T
        df_c = pd.json_normalize(row.children)

        # I'm not sure what your expected sorting order is. Perhaps you might want to delete the next line.
        df_c = df_c.sort_values(['columns._custom_portfolio_target_347209', 'columns.inception_event_date'])

        df = pd.concat([df, df_p, df_c], axis=0, ignore_index=True)

    column_name_mapper = {'columns.' + column['key']: column['display_name'] for column in api_response['meta']['columns']}
    column_name_mapper.update({'name': 'New Entity Group'})
    column_names = ["New Entity Group", "Entity ID", "Adjusted Value (1/31/2022, No Div, USD)", "Adjusted TWR (Current Quarter, No Div, USD)", "Adjusted TWR (YTD, No Div, USD)", "Annualized Adjusted TWR (Since Inception, No Div, USD)", "Inception Date", "Risk Target"]
    df = df.rename(columns=column_name_mapper).reindex(columns=column_names)

    return df

df = unpack_response(api_response)

Output:

New Entity Group Entity ID Adjusted Value (1/31/2022, No Div, USD) Adjusted TWR (Current Quarter, No Div, USD) Adjusted TWR (YTD, No Div, USD) Annualized Adjusted TWR (Since Inception, No Div, USD) Inception Date Risk Target
Portfolio_1 260786 -0.44555 -0.44555 -0.44555 2021-04-07 N/A
The FW Irrev Family Tr 9552252 260786 0 0 0 2022-01-11 N/A
Portfolio_2 1.83967e+07 -0.0578035 -0.0578035 -0.0547021 2021-09-03 Growth
FW DAF 10946585 1.83967e+07 -0.0578035 -0.0578035 -0.0547021 2021-09-03 Growth
Portfolio_3 6.01438e+07 -0.0442006 -0.0442006 0.0774833 2020-12-17
The FW Family Trust 13014080 475356 -0.0610201 -0.0610201 -0.0396857 2021-04-09 Aggressive
FW Liquid Fund LP 13396796 5.28995e+07 -0.0414977 -0.0414977 -0.0414974 2021-12-30 Aggressive
FW Holdings No. 2 LLC 8413655 6.76894e+06 -0.00773093 -0.00773093 0.118448 2021-03-05 N/A
FW and FR Joint 9957007 -1.44 nan nan nan 2021-12-21 N/A

Comments

1
+200

Personally, I would not use pd.json_normalize for this case. Your JSON is quite complex, and unless you're really experienced with json_normalize, the following code may take less time to understand for the average dev. In fact, you don't even need to see the JSON to understand exactly what this code does (although it would certainly help ;).

First, we can extract the objects (portfolios and their children) from the JSON into a list, and use a series of steps to get them in the right form and order:

def prep_obj(o):
    """Prepares an object (portfolio/child) from the JSON to be inserted into a dataframe."""
    return {
        'New Entity Group': o['name'],
    } | o['columns']


# Get a list of lists, where each sub-list contains the portfolio object at index 0 and then the portfolio object's children:
groups = [[prep_obj(o), *[prep_obj(child) for child in o['children']]] for o in api_response['data']['attributes']['total']['children']]

# Sort the portfolio groups by their number:
groups.sort(key=lambda g: int(g[0]['New Entity Group'].split('_')[1]))

# Reverse the children of each portfolio group:
groups = [[g[0]] + g[1:][::-1] for g in groups]

# Flatten out the groups into one large list of objects:
objects = [obj for group in groups for obj in group]
# The above is exactly equivalent to the following:
#   objects = []
#   for group in groups:
#       for obj in group:
#           objects.append(obj)

Next, create the dataframe:

# Create a mapping for column names so that their display names can be used:
mapping = {col['key']: col['display_name'] for col in api_response['meta']['columns']}

# Create a dataframe from the list of objects:
df = pd.DataFrame(objects)

# Correct column names:
df = df.rename(mapping, axis=1)
# Reorder columns:
column_names = ["New Entity Group", "Entity ID", "Adjusted Value (1/31/2022, No Div, USD)", "Adjusted TWR (Current Quarter, No Div, USD)", "Adjusted TWR (YTD, No Div, USD)", "Annualized Adjusted TWR (Since Inception, No Div, USD)", "Inception Date", "Risk Target"]
df = df[column_names]

And formatting:

def format_twr_col(col):
    return (
        col
        .abs()
        .mul(100)
        .round(2)
        .pipe(lambda s: s.where(s.eq(0) | s.isna(), '(' + s.astype(str) + '%)'))
        .pipe(lambda s: s.where(s.ne(0) | s.isna(), s.astype(str) + '%'))
        .fillna('-')
    )

def format_value_col(col):
    positive_mask = col.ge(0)

    col[positive_mask] = (
        col[positive_mask]
        .round()
        .astype(int)
        .map('${:,}'.format)
    )

    col[~positive_mask] = (
        col[~positive_mask]
        .astype(float)
        .round()
        .astype(int)
        .abs()
        .map('(${:,})'.format)
    )
    
    return col

df['Adjusted TWR (Current Quarter, No Div, USD)'] = format_twr_col(df['Adjusted TWR (Current Quarter, No Div, USD)'])
df['Annualized Adjusted TWR (Since Inception, No Div, USD)'] = format_twr_col(df['Annualized Adjusted TWR (Since Inception, No Div, USD)'])
df['Adjusted TWR (YTD, No Div, USD)'] = format_twr_col(df['Adjusted TWR (YTD, No Div, USD)'])

df['Adjusted Value (1/31/2022, No Div, USD)'] = format_value_col(df['Adjusted Value (1/31/2022, No Div, USD)'].copy())

df['Inception Date'] = pd.to_datetime(df['Inception Date']).dt.strftime('%b %d, %Y')

df['Entity ID'] = df['Entity ID'].fillna('')

And... voilà:

>>> pd.options.display.max_columns = None
>>> df
         New Entity Group Entity ID Adjusted Value (1/31/2022, No Div, USD)  Adjusted TWR (Current Quarter, No Div, USD) Adjusted TWR (YTD, No Div, USD)  Annualized Adjusted TWR (Since Inception, No Div, USD) Inception Date  Risk Target
0             Portfolio_1                                          $260,786                                     (44.55%)                        (44.55%)                                            (44.55%)       Apr 07, 2021          N/A
1  The FW Irrev Family Tr   9552252                                $260,786                                         0.0%                            0.0%                                                0.0%       Jan 11, 2022          N/A
2             Portfolio_2                                       $18,396,664                                      (5.78%)                         (5.78%)                                             (5.47%)       Sep 03, 2021       Growth
3                  FW DAF  10946585                             $18,396,664                                      (5.78%)                         (5.78%)                                             (5.47%)       Sep 03, 2021       Growth
4             Portfolio_3                                       $60,143,818                                      (4.42%)                         (4.42%)                                             (7.75%)       Dec 17, 2020          NaN
5     The FW Family Trust  13014080                                $475,356                                       (6.1%)                          (6.1%)                                             (3.97%)       Apr 09, 2021   Aggressive
6       FW Liquid Fund LP  13396796                             $52,899,527                                      (4.15%)                         (4.15%)                                             (4.15%)       Dec 30, 2021   Aggressive
7   FW Holdings No. 2 LLC   8413655                              $6,768,937                                      (0.77%)                         (0.77%)                                            (11.84%)       Mar 05, 2021          N/A
8         FW and FR Joint   9957007                                    ($1)                                            -                               -                                                   -       Dec 21, 2021          N/A

2 Comments

I appreciate there is no obligation to reply, however how would I go a level deeper i.e., children contained children, that contained children. I tried something api_response['data']['attributes']['total']['children']['children] but as you can imagine, it didn't go well!
Hi @William, it's really late here, I'll try to get to you tomorrow. Unfortunately I'm off on a trip tomorrow for a week, so I may not get back until then :/

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.