0

I have the following (excerpt) json data structure:

{
    "apiToken": {
        "createdAt": "2022-03-04T12:18:29.000956Z",
        "expiresAt": "2022-09-04T12:18:29.000956Z"
    },
    "canGenerateApiToken": true,
    "dateJoined": "2021-01-29T10:07:04.395172Z",
    "email": "[email protected]",
    "emailReadOnly": true,
    "emailVerified": true,
    "firstLogin": "2021-01-29T13:01:33.294216Z",
    "fullName": "John Doe",
    "fullNameReadOnly": true,
    "groupsReadOnly": false,
    "id": "32168415841",
    "isSystem": false,
    "lastLogin": "2022-09-12T08:51:00.159750Z",
    "lowestRole": "Admin",
    "primaryTwoFaMethod": "application",
    "scope": "account",
    "scopeRoles": [
        {
            "id": "68418945648943589",
            "name": "AT || ACME Inc.",
            "roleId": "9848949354653168",
            "roleName": "Admin",
            "roles": [
                "Admin"
            ]
        }
    ],
    "siteRoles": [],
    "source": "sso_saml",
    "tenantRoles": [],
    "twoFaEnabled": true
}

I'm trying to write certain data into an excel file with:

df = pd.json_normalize(result)
df.head()
df[['scope', 'fullName', 'email', 'lowestRole', 'scope',
    'scopeRoles.name']].to_excel(completename)

But I struggle with 'scopeRoles.name' as it's nested.

with the code above I get

raise KeyError(f"None of [{key}] are in the [{axis_name}]") KeyError: "None of [Index(['scope', 'fullName', 'email', 'lowestRole', 'scope', 'scopeRoles.name'], dtype='object')] are in the [columns]"

I also tried different versions, but always failed.

I basically need to understand how I can specify the fields to write into excel when the field itself is nested. If I just use "non-nested" entries it works perfectly fine

thanks

2
  • Not gonna lie to you mate, that is kinda of a weird formulated json, sometimes there is dicts and sometimes there is list inside values, anyway flatten_json will help you out after that just try pd normalize Commented Sep 19, 2022 at 14:34
  • I know @INGl0R1AM0R1 but it's out of my control. Commented Sep 19, 2022 at 14:53

2 Answers 2

1

You need to flatten your JSON data file. You could use the flatten_json package.

pip install flatten_json

from flatten_json import flatten

unflat_json = {'user':
           {'Rachel':
            {'UserID': 1717171717,
             'Email': '[email protected]',
             'friends': ['John', 'Jeremy', 'Emily']
             }
            }
           }

flat_json = flatten(unflat_json)

print(flat_json)

Output:

{‘user_Rachel_UserID’: 1717171717, ‘user_Rachel_Email’: ‘[email protected]’, ‘user_Rachel_friends_0’: ‘John’, ‘user_Rachel_friends_1’: ‘Jeremy’, ‘user_Rachel_friends_2’: ‘Emily’}
Sign up to request clarification or add additional context in comments.

2 Comments

The result of "flat_json" looks exactly the same as before (with df = pd.json_normalize(result)) I have written both results to a TXT file and they look exactly the same
From the resulting flat hierarchy you can take the keys names and replace the _ with . and then you have Roles.name instead of Roles_name. The point is you could import the whole flattened json into an excel and choose/use only the needed columns.
0

To deal with a list of dictionaries, you can use df.from_records(). But, you need to process it separately to combine each dataframe together. I assumed the data used is exactly the same, considering the df['scopeRoles'] only consisted of one element. Please try something like this:

import pandas as pd

result = {
    "apiToken": {
        "createdAt": "2022-03-04T12:18:29.000956Z",
        "expiresAt": "2022-09-04T12:18:29.000956Z"
    },
    "canGenerateApiToken": True,
    "dateJoined": "2021-01-29T10:07:04.395172Z",
    "email": "[email protected]",
    "emailReadOnly": True,
    "emailVerified": True,
    "firstLogin": "2021-01-29T13:01:33.294216Z",
    "fullName": "John Doe",
    "fullNameReadOnly": True,
    "groupsReadOnly": False,
    "id": "32168415841",
    "isSystem": False,
    "lastLogin": "2022-09-12T08:51:00.159750Z",
    "lowestRole": "Admin",
    "primaryTwoFaMethod": "application",
    "scope": "account",
    "scopeRoles": [
        {
            "id": "68418945648943589",
            "name": "AT || ACME Inc.",
            "roleId": "9848949354653168",
            "roleName": "Admin",
            "roles": [
                "Admin"
            ]
        }
    ],
    "siteRoles": [],
    "source": "sso_saml",
    "tenantRoles": [],
    "twoFaEnabled": True
}

df = pd.json_normalize(result)
df2 = df[['scope', 'fullName', 'email', 'lowestRole', 'scope']]

# from_records() returns a dataframe from a list of dict df['scopeRoles'].
df3 = df.from_records(df["scopeRoles"][0])

# join df2 and df3
res = df2.join(df3)
print(res)

I hope this code helps!

EDIT

To get the name column only, you just have to subscript like so:

df3 = df.from_records(df["scopeRoles"][0])['name']

1 Comment

The result adds a "data_xxx_" in front of every line, where xxx is an incrementing number. how can I access the content by ignoring the incrementing number?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.