Pandas for reading nested json with json_normalize or read_json

Question

I have a json object with the following structure:

import pandas as pd
json_data_raw = [{"indicator_value":{"195606": 
                                              {"2010":{"AFG": 0.29, 
                                                       "ZWE": 0.607}, 
                                               "2011": {"AFG": 0.406, 
                                                       "ZWE": 0.737}, 
                                               "2012": {"AFG": 0.345, 
                                                       "ZWE": 0.587}, 
                                               "2013": {"AFG": 0.28, 
                                                       "ZWE": 0.871}, 
                                               "2014": {"AFG": 0.253, 
                                                       "ZWE": 0.88}, 
                                               "2015": {"AFG": 0.262, 
                                                       "ZWE": 0.88}, 
                                               "2016": {"AFG": 0.245, 
                                                       "ZWE": 0.77}, 
                                               "2017": {"AFG": 0.247, 
                                                       "ZWE": 0.845}, 
                                               "2018": {"AFG": 0.254, 
                                                       "ZWE": 0.849}}}, 
                  "country_name": {"AFG": "Afghanistan", 
                                   "ZWE": "Zimbabwe"}, 
                  "indicator_name": {"195606": "Carbon dioxide emissions, production emissions per capita (tonnes)"}}]

When I try to read this structure with pd.read_json method I manage to load only the first node of the data, namely "indicator_value", "country_name" and "indicator_name" with the nested instances only.

I tried also to use pd.json_normalize and no success neither since I don't quite get how to specify the parameters record_path or meta (metadata) contained in my json object and getting odd results.

Ideally I would like to get a table like this.

and so on for the other country/year/values...

Probably this is more than a single task and will involve other kind of object manipulation. In any case I much appreciate some help.

Hey ruben, Could you post the desired result in table format? That makes it easier for other people to read your question. — Jonas
– Jonas, Commented Feb 22, 2021 at 17:08
In principle I wanted to add the desired table as an inserted image but apparently this was not allowed. They only give me the option of posting the link though :/ First post, you know... newby here. But thanks for the feedback. — Ruben Lopez
– Ruben Lopez, Commented Feb 22, 2021 at 19:08

Anurag Dabas · Accepted Answer · 2021-02-23 04:34:29Z

Firstly create a variable named d....basically it contains the inner dictionary

d=json_data_raw[0]

Now create a dataframe df:-

df=pd.DataFrame(list(list(d.values())[0].values())[0])

so here list(list(d.values())[0].values())[0] have 3 level deeper dictionary

now,

df=df.stack().to_frame()

Note:-here you can overwrite df or create a new variable the above code make a multi index dataframe so to remove multi index we use:-

df.reset_index(inplace=True)

now to rename columns:-

df.columns=['country_code','year','Carbon dioxide emssions,production emission per capita(tonnes)']

Now we will create a function that will change the country_code column to country name i.e

def country(val):
    if val=='AFG':
        return 'Afganistan'
    else:
        return 'Zimbabwe'

Now we will use apply() method

country=df['country_code'].apply(country)

Now finally we will insert the country column in our dataframe df

df.insert(0,'country_name',country)

Now if you print df you will get your desired output

output:-

    country_name    country_code    year    Carbon dioxide emssions,production emission per capita(tonnes)
0   Afganistan               AFG    2010    0.290
1   Afganistan               AFG    2011    0.406
2   Afganistan               AFG    2012    0.345
3   Afganistan               AFG    2013    0.280
4   Afganistan               AFG    2014    0.253
5   Afganistan               AFG    2015    0.262
6   Afganistan               AFG    2016    0.245
7   Afganistan               AFG    2017    0.247
8   Afganistan               AFG    2018    0.254
9   Zimbabwe                 ZWE    2010    0.607
10  Zimbabwe                 ZWE    2011    0.737
11  Zimbabwe                 ZWE    2012    0.587
12  Zimbabwe                 ZWE    2013    0.871
13  Zimbabwe                 ZWE    2014    0.880
14  Zimbabwe                 ZWE    2015    0.880
15  Zimbabwe                 ZWE    2016    0.770
16  Zimbabwe                 ZWE    2017    0.845
17  Zimbabwe                 ZWE    2018    0.849

Note:- well see I create a function because in your json object there are only 2 country and if you have country more than 2 and you have this same json format then the code before I define/create function country() work as it is.

but then instead of making function and then using apply() method use this:-

countryinfo=list(json_data_raw[0].values())[1]

Now if you print countryinfo you get a dictionary in which keys are country code and its values are country name

Output of countryinfo:-

{'AFG': 'Afghanistan', 'ZWE': 'Zimbabwe'}

well by list(json_data_raw[0].values())[1] we are grabbing the data of "country_name" which is stored inside your json object:- json_data_raw and storing it in a variable named country info

so now

country=df['country_code'].replace(countryinfo.keys(),countryinfo.values())

so here basically we are replacing the value of df['country_code'] with the values of countryinfo dictionary so it will returns a Series of values and we are storing that Series in variable named country

and finally no we insert 'country_name' column at index 0 in our dataframe df

df.insert(0,'country_name',country)

Now if you print df then you will get your desired output regardless of number of countries you have in your json_data_raw

Output of df:-

    country_name    country_code    year    Carbon dioxide emssions,production emission per capita(tonnes)
0   Afganistan               AFG    2010    0.290
1   Afganistan               AFG    2011    0.406
2   Afganistan               AFG    2012    0.345
3   Afganistan               AFG    2013    0.280
4   Afganistan               AFG    2014    0.253
5   Afganistan               AFG    2015    0.262
6   Afganistan               AFG    2016    0.245
7   Afganistan               AFG    2017    0.247
8   Afganistan               AFG    2018    0.254
9   Zimbabwe                 ZWE    2010    0.607
10  Zimbabwe                 ZWE    2011    0.737
11  Zimbabwe                 ZWE    2012    0.587
12  Zimbabwe                 ZWE    2013    0.871
13  Zimbabwe                 ZWE    2014    0.880
14  Zimbabwe                 ZWE    2015    0.880
15  Zimbabwe                 ZWE    2016    0.770
16  Zimbabwe                 ZWE    2017    0.845
17  Zimbabwe                 ZWE    2018    0.849

This is exactly what I needed. Thanks a lot @anurag-dabas!!! plus the detailed explanation.
How would you generalize if you for instance don't know the number of countries @Anurag in your function that generates the country names?
@Ruben Lopez I added the answer of your question in my answer.....so just have a look

Collectives™ on Stack Overflow

Pandas for reading nested json with json_normalize or read_json

1 Answer 1

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related