Nested JSON to Pandas Data frame

Question

Input JSON

{
    "tables": {
        "Cloc": {
            "MINT": {
                "CANDY": {
                    "Mob": [{
                            "loc": "AA",
                            "loc2": ["AAA"]
                        },
                        {
                            "loc": "AA",
                            "loc2": ["AAA"]
   
                        }
                    ]
                }
            }
        },
        "T1": {
            "MINT": {
                "T2": {
                    "T3": [{
                        "loc": "AAA",
                        "loc2": ["AAA"]
                    }]
                }
            }
        }
    }
}

Expected Output

=========================================

I have tried processing this nested JSON using pd.json_normalize()

data = pd.DataFrame(nested_json['tables']['Cloc']['MINT']['CANDY']['Mob'])

I have no clue how to proceed, any help or guidance is highly appreciated.

Many Thanks!!

Hello Roman, Thanks for reply! Yes there can be more loc3 and loc4 etc.! — Alind Billore
– Alind Billore, Commented Jan 23, 2023 at 20:31
Are there any constraints? can there be a list of dictionaries in the CANDY key? or if we approach a list it means we are in the "deepest" dictionary? — 0Interest
– 0Interest, Commented Jan 23, 2023 at 21:23

unknown person · Accepted Answer · 2023-01-23 22:27:02Z

1

Assuming only the final level consists of a list of dictionaries, you could simply compute the rows. We could do this recursively, so that it works for any number of nesting levels.

rows = []
def find_rows(x, current_row):
    if isinstance(x, dict):
        for k,v in x.items():
            find_rows(v, current_row+[k])
    else: # we are at the final level, where we have a list of dictionaries
        for locs_map in x:
            for v in locs_map.values():
                rows.append(current_row+[v])

find_rows(d['tables'], [])
# Now I'm assuming you have only the number of levels as in your example
data = pd.DataFrame.from_records(rows, columns= ['Tables', 'L_1', 'L_2', 'L_3', 'L_4'])
data = data.loc[data.astype(str).drop_duplicates().index]

answered Jan 23, 2023 at 22:27

unknown person

683 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Nested JSON to Pandas Data frame

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related