json normalize to Dataframe for nested objects, Python

Question

i am trying to use normalize function to convert json to data frame using json_normalize. This is the json i am working with

data = {
    "Parent":
[
        {
        "Attributes":
        [
            {
                "Values": [{
                    "Month": "Jan",
                    "Value": "100"
                }],
                "Id": "90",
                "CustId": "3"
            },
            {
                "Values": [{
                    "Month": "Jan",
                    "Value": "101"
                }],
                "Id": "88" 
            },
            {
                "Values": [{
                    "Month": "Jan",
                    "Value": "102"
                }],
                "Id": "89" 
            }
        ],
        "DId": "1234"
    },
    {
        "Attributes":
        [
            {
                "Values": [{
                    "Month": "Jan",
                    "Value": "200"
                }],
                "Id": "90",
                "CustId": "3"
            },
            {
                "Values": [{
                    "Month": "Jan",
                    "Value": "201"
                }],
                "Id": "88" 
            },
            {
                "Values": [{
                    "Month": "Jan",
                    "Value": "202"
                }],
                "Id": "89" 
            }
        ],
        "DId": "5678"
    }
]
}

and this is what i have tried

print(type(data))
result = pd.json_normalize(data, record_path=['Parent',['Attributes']], max_level=2)
print(result.to_string())

And it gave the result , but it is missing the DId and values column is still a list of dict

And this is what i want to achieve

Any guidance how to accomplish it would be highly appreciated.

Timus · Accepted Answer · 2023-01-24 11:31:25Z

You can specify meta data (data above the record_path) via the meta keyword argument (in combination with errors='ignore' for meta data that isn't necessarily there, like CustId). For instance

result = pd.json_normalize(
    data,
    record_path=['Parent', 'Attributes', 'Values'],
    meta=[
        ['Parent', 'DId'],
        ['Parent', 'Attributes', 'Id'],
        ['Parent', 'Attributes', 'CustId']
    ],
    errors='ignore'
)

results in

  Month Value Parent.DId Parent.Attributes.Id Parent.Attributes.CustId
0   Jan   100       1234                   90                        3
1   Jan   101       1234                   88                      NaN
2   Jan   102       1234                   89                      NaN
3   Jan   200       5678                   90                        3
4   Jan   201       5678                   88                      NaN
5   Jan   202       5678                   89                      NaN

simpleApp · Accepted Answer · 2023-01-24 04:44:15Z

0

this is one way of achieving this, I think step1 and step2 can be combined together which needs some more insight on pd.json_normalize

#step1
df1=pd.json_normalize(
    data['Parent'],["Attributes","Values"]
)
#step2
df2=pd.json_normalize(
    data['Parent'],"Attributes","DId",
)
df2=df2.drop(['Values'], axis=1)

result=df2.join(df1).reindex(['DId','Id','CustId','Month','Value'], axis=1)\
.sort_values(by=['DId','Id']) \
.rename(columns={'Id':'Attr.Id','CustId':'Attr.CustId','Month':'Attr.Values.Month',
                'Value':'Attr.Values.value'
                })

result:

answered Jan 24, 2023 at 4:44

simpleApp

3,1782 gold badges15 silver badges21 bronze badges

Collectives™ on Stack Overflow

json normalize to Dataframe for nested objects, Python

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related