1

Here is the example JSON:

{
    "ApartmentBuilding":{
        "Address":{
            "HouseNumber": 5,
            "Street": "DataStreet",
            "ZipCode": 5100
        },
        "Apartments":[
            {
                "Number": 1,
                "Price": 500,
                "Residents": [
                    {
                        "Name": "Bob",
                        "Age": 43
                    },
                    {
                        "Name": "Alice",
                        "Age": 42
                    }
                ]
            },
            {
                "Number": 2,
                "Price": 750,
                "Residents": [
                    {
                        "Name": "Jane",
                        "Age": 43
                    },
                    {
                        "Name": "William",
                        "Age": 42
                    }
                ]
            },
            {
                "Number": 3,
                "Price": 1000,
                "Residents": []
            }
        ]
    }
}

I used below function from : Python Pandas - Flatten Nested JSON

import json
import pandas as pd

def flatten_json(nested_json: dict, exclude: list=['']) -> dict:
    """
    Flatten a list of nested dicts.
    """
    out = dict()
    def flatten(x: (list, dict, str), name: str='', exclude=exclude):
        if type(x) is dict:
            for a in x:
                if a not in exclude:
                    flatten(x[a], f'{name}{a}_')
        elif type(x) is list:
            i = 0
            for a in x:
                flatten(a, f'{name}{i}_')
                i += 1
        else:
            out[name[:-1]] = x

    flatten(nested_json)
    return out

with open("apartments.json") as f:
    data = json.load(f)

print(data)

df = pd.DataFrame([flatten_json(x) for x in data['ApartmentBuilding']])

print(df)

I am aiming to completely flatten the JSON to be transformed into a Panda Dataframe however, get a strange output as belows:

0     Address
1  Apartments

I am basically after something flatten like this:

enter image description here

1 Answer 1

2

Here is another way to do it using Pandas json_normalize and explode:

import json

import pandas as pd

with open("file.json") as f:
    data = json.load(f)

df = pd.json_normalize(data["ApartmentBuilding"])

First, get and flatten the different parts of the dataframe:

building = df.explode("Apartments").reset_index(drop=True)

apartments = pd.DataFrame(building["Apartments"].to_dict()).T.explode("Residents")

residents = pd.DataFrame(
    apartments["Residents"].dropna().reset_index(drop=True).to_list()
)

Then, build the dataframe back by merging and concatenating the flattened parts:

new_df = pd.merge(
    left=building.loc[:, building.columns != "Apartments"],
    right=apartments.loc[:, apartments.columns != "Residents"],
    right_index=True,
    left_index=True,
).reset_index(drop=True)

new_df = pd.concat([new_df, residents], axis=1)

Do a little cleanup:

new_df.columns = [col.replace("Address.", "") for col in new_df.columns]

And finally:

print(new_df)
# Output
   HouseNumber      Street  ZipCode Number Price     Name   Age
0            5  DataStreet     5100      1   500      Bob  43.0
1            5  DataStreet     5100      1   500    Alice  42.0
2            5  DataStreet     5100      2   750     Jane  43.0
3            5  DataStreet     5100      2   750  William  42.0
4            5  DataStreet     5100      3  1000      NaN   NaN
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.