python parse data from nested json

Question

I have the following (excerpt) json data structure:

{
    "apiToken": {
        "createdAt": "2022-03-04T12:18:29.000956Z",
        "expiresAt": "2022-09-04T12:18:29.000956Z"
    },
    "canGenerateApiToken": true,
    "dateJoined": "2021-01-29T10:07:04.395172Z",
    "email": "[email protected]",
    "emailReadOnly": true,
    "emailVerified": true,
    "firstLogin": "2021-01-29T13:01:33.294216Z",
    "fullName": "John Doe",
    "fullNameReadOnly": true,
    "groupsReadOnly": false,
    "id": "32168415841",
    "isSystem": false,
    "lastLogin": "2022-09-12T08:51:00.159750Z",
    "lowestRole": "Admin",
    "primaryTwoFaMethod": "application",
    "scope": "account",
    "scopeRoles": [
        {
            "id": "68418945648943589",
            "name": "AT || ACME Inc.",
            "roleId": "9848949354653168",
            "roleName": "Admin",
            "roles": [
                "Admin"
            ]
        }
    ],
    "siteRoles": [],
    "source": "sso_saml",
    "tenantRoles": [],
    "twoFaEnabled": true
}

I'm trying to write certain data into an excel file with:

df = pd.json_normalize(result)
df.head()
df[['scope', 'fullName', 'email', 'lowestRole', 'scope',
    'scopeRoles.name']].to_excel(completename)

But I struggle with 'scopeRoles.name' as it's nested.

with the code above I get

raise KeyError(f"None of [{key}] are in the [{axis_name}]") KeyError: "None of [Index(['scope', 'fullName', 'email', 'lowestRole', 'scope', 'scopeRoles.name'], dtype='object')] are in the [columns]"

I also tried different versions, but always failed.

I basically need to understand how I can specify the fields to write into excel when the field itself is nested. If I just use "non-nested" entries it works perfectly fine

thanks

Not gonna lie to you mate, that is kinda of a weird formulated json, sometimes there is dicts and sometimes there is list inside values, anyway flatten_json will help you out after that just try pd normalize — INGl0R1AM0R1
– INGl0R1AM0R1, Commented Sep 19, 2022 at 14:34

Petronella · Accepted Answer · 2022-09-19 14:31:07Z

1

You need to flatten your JSON data file. You could use the flatten_json package.

pip install flatten_json

from flatten_json import flatten

unflat_json = {'user':
           {'Rachel':
            {'UserID': 1717171717,
             'Email': '[email protected]',
             'friends': ['John', 'Jeremy', 'Emily']
             }
            }
           }

flat_json = flatten(unflat_json)

print(flat_json)

Output:

{‘user_Rachel_UserID’: 1717171717, ‘user_Rachel_Email’: ‘[email protected]’, ‘user_Rachel_friends_0’: ‘John’, ‘user_Rachel_friends_1’: ‘Jeremy’, ‘user_Rachel_friends_2’: ‘Emily’}

answered Sep 19, 2022 at 14:31

Petronella

2,5551 gold badge18 silver badges24 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

f0rd42 Over a year ago

The result of "flat_json" looks exactly the same as before (with df = pd.json_normalize(result)) I have written both results to a TXT file and they look exactly the same

Petronella Over a year ago

From the resulting flat hierarchy you can take the keys names and replace the _ with . and then you have Roles.name instead of Roles_name. The point is you could import the whole flattened json into an excel and choose/use only the needed columns.

Coderio · Accepted Answer · 2022-09-20 03:33:17Z

To deal with a list of dictionaries, you can use df.from_records(). But, you need to process it separately to combine each dataframe together. I assumed the data used is exactly the same, considering the df['scopeRoles'] only consisted of one element. Please try something like this:

import pandas as pd

result = {
    "apiToken": {
        "createdAt": "2022-03-04T12:18:29.000956Z",
        "expiresAt": "2022-09-04T12:18:29.000956Z"
    },
    "canGenerateApiToken": True,
    "dateJoined": "2021-01-29T10:07:04.395172Z",
    "email": "[email protected]",
    "emailReadOnly": True,
    "emailVerified": True,
    "firstLogin": "2021-01-29T13:01:33.294216Z",
    "fullName": "John Doe",
    "fullNameReadOnly": True,
    "groupsReadOnly": False,
    "id": "32168415841",
    "isSystem": False,
    "lastLogin": "2022-09-12T08:51:00.159750Z",
    "lowestRole": "Admin",
    "primaryTwoFaMethod": "application",
    "scope": "account",
    "scopeRoles": [
        {
            "id": "68418945648943589",
            "name": "AT || ACME Inc.",
            "roleId": "9848949354653168",
            "roleName": "Admin",
            "roles": [
                "Admin"
            ]
        }
    ],
    "siteRoles": [],
    "source": "sso_saml",
    "tenantRoles": [],
    "twoFaEnabled": True
}

df = pd.json_normalize(result)
df2 = df[['scope', 'fullName', 'email', 'lowestRole', 'scope']]

# from_records() returns a dataframe from a list of dict df['scopeRoles'].
df3 = df.from_records(df["scopeRoles"][0])

# join df2 and df3
res = df2.join(df3)
print(res)

I hope this code helps!

EDIT

To get the name column only, you just have to subscript like so:

df3 = df.from_records(df["scopeRoles"][0])['name']

The result adds a "data_xxx_" in front of every line, where xxx is an incrementing number. how can I access the content by ignoring the incrementing number?

Collectives™ on Stack Overflow

python parse data from nested json

2 Answers 2

2 Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related