How to create a pandas dataframe from a nested json file?

Question

I have a json file which looks like this:

{
    "data": {
        "success": true,
        "timeseries": true,
        "start_date": "2022-10-01",
        "end_date": "2022-10-04",
        "base": "EUR",
        "rates": {
            "2022-10-01": {
                "NG": 0.1448939471560284
            },
            "2022-10-02": {
                "NG": 0.14487923291390148
            },
            "2022-10-03": {
                "NG": 0.1454857922753868
            },
            "2022-10-04": {
                "NG": 0.1507352356663182
            }
        },
        "unit": "per MMBtu"
    }
}

I want to create a dataframe which looks like this:

Date        NG        base 
2022-10-01  0.144894  EUR
2022-10-02  0.144879  EUR
2022-10-03  0.145486  EUR
2022-10-04  0.150735  EUR

This is what I tried:

with open(r'C:\Users\EH\Desktop\tools\json_files\blue_file.json','r') as f:
    data = json.loads(f.read())

df1 = pd.DataFrame(data['data']['rates'])
df1 = df1.T
df2 = pd.DataFrame(data['data'])
df2 = df2.base

merge = [df1, df2]
df3 = pd.concat(merge)

print(df3)

My current output:

                  NG    0
2022-10-01  0.144894  NaN
2022-10-02  0.144879  NaN
2022-10-03  0.145486  NaN
2022-10-04  0.150735  NaN
2022-10-01       NaN  EUR
2022-10-02       NaN  EUR
2022-10-03       NaN  EUR
2022-10-04       NaN  EUR

As you can see something is not going okay, I dont understand where the NaNs come from.

Gonçalo Peres · Accepted Answer · 2022-10-06 14:44:44Z

Assuming that the file is stored in a variable called data

data = { "data": { "success": True, "timeseries": True, "start_date": "2022-10-01", "end_date": "2022-10-04", "base": "EUR", "rates": { "2022-10-01": { "NG": 0.1448939471560284 }, "2022-10-02": { "NG": 0.14487923291390148 }, "2022-10-03": { "NG": 0.1454857922753868 }, "2022-10-04": { "NG": 0.1507352356663182 } }, "unit": "per MMBtu" } }

First, one will read the data to a dataframe with pandas.DataFrame.from_dict as follows

df = pd.DataFrame.from_dict(data['data']['rates'], orient='index')

[Out]:
                  NG
2022-10-01  0.144894
2022-10-02  0.144879
2022-10-03  0.145486
2022-10-04  0.150735

But, as one wants doesn't want the columns with date to be the index, and one wants that column to have the name Date, one will start by resetting the index using pandas.DataFrame.reset_index

df = df.reset_index()

[Out]:
        index        NG
0  2022-10-01  0.144894
1  2022-10-02  0.144879
2  2022-10-03  0.145486
3  2022-10-04  0.150735

And then one will be able to rename the columns to the desired output format pandas.DataFrame.rename

df = df.rename(columns={'index':'Date'})

[Out]:

         Date        NG
0  2022-10-01  0.144894
1  2022-10-02  0.144879
2  2022-10-03  0.145486
3  2022-10-04  0.150735

As it is still missing the base column, one can pick it up from the data as follows

df['base'] = data['data']['base']

[Out]:

         Date        NG base
0  2022-10-01  0.144894  EUR
1  2022-10-02  0.144879  EUR
2  2022-10-03  0.145486  EUR
3  2022-10-04  0.150735  EUR

Even though one already has the desired output, one can wrap everything into a one-liner as follows

df = pd.DataFrame.from_dict(data['data']['rates'], orient='index').reset_index().rename(columns={'index':'Date'}).assign(base=data['data']['base'])

[Out]:
         Date        NG base
0  2022-10-01  0.144894  EUR
1  2022-10-02  0.144879  EUR
2  2022-10-03  0.145486  EUR
3  2022-10-04  0.150735  EUR

Lipton · Accepted Answer · 2022-10-06 14:25:04Z

1

Using what you had done, you just needed to specify the axis on which you want to concatenate the 2 dataframes by default pandas concatenated one below the other.

pd.concat(merge, axis=1) # default 0

answered Oct 6, 2022 at 14:25

Lipton

112 bronze badges

1 Comment

Intelligent Shade of Blue Over a year ago

Plus one for answering the question.

Code Different · Accepted Answer · 2022-10-06 14:18:01Z

0

JSON is a very flexible format and no function in pandas can parse all types of JSON. You need to preprocess the JSON before constructing the dataframe:

with open("data.json") as fp:
    data = json.load(fp)

df = pd.DataFrame(
    [(date, rate["NG"]) for date, rate in data["data"]["rates"].items()],
    columns=["Date", "NG"],
).assign(base=data["data"]["base"])

answered Oct 6, 2022 at 14:18

Code Different

93.4k16 gold badges154 silver badges175 bronze badges

1 Comment

Erling_Haaland Over a year ago

That is some very long code of line. How do you even come up with it? But it works!

Intelligent Shade of Blue · Accepted Answer · 2022-10-06 14:40:28Z

0

You were almost there. The reason you are getting NaNs is that you need to specify different axis when concatenating the dataframes: pd.concat(merge, axis=1).

Here is my version:

import json
import pandas as pd

with open("blue_file.json", "r") as f:
    data = json.loads(f.read())

df = pd.DataFrame.from_dict(data["data"]["rates"], orient="index")
df["base"] = data["data"]["base"]
df

Output:

                  NG base
2022-10-01  0.144894  EUR
2022-10-02  0.144879  EUR
2022-10-03  0.145486  EUR
2022-10-04  0.150735  EUR

answered Oct 6, 2022 at 14:40

Intelligent Shade of Blue

6,48828 silver badges56 bronze badges

Comments

Bushmaster · Accepted Answer · 2022-10-06 15:38:16Z

0

simply:

import json
with open(r'path.json','r') as f:
    data = json.loads(f.read())

df=pd.DataFrame(data['data']).reset_index()
df=df.join(pd.json_normalize(df.pop('rates')))
print(df)

df=df[['index','NG','base']]
df=df.rename(columns={'index':'Date'})
'''
    index      success  timeseries  start_date  end_date    base    unit    NG
0   2022-10-01  true    true    2022-10-01  2022-10-04  EUR per MMBtu   0.1448939471560284
1   2022-10-02  true    true    2022-10-01  2022-10-04  EUR per MMBtu   0.14487923291390148
2   2022-10-03  true    true    2022-10-01  2022-10-04  EUR per MMBtu   0.1454857922753868
3   2022-10-04  true    true    2022-10-01  2022-10-04  EUR per MMBtu   0.1507352356663182

'''

edited Oct 6, 2022 at 15:38

answered Oct 6, 2022 at 14:27

Bushmaster

4,6364 gold badges11 silver badges34 bronze badges

2 Comments

Intelligent Shade of Blue Over a year ago

Not going to downvote, but the output doesn't look quite the same as what the OP is asking.

Bushmaster Over a year ago

I thought the person asking the question could filter the columns. But you're right, I'm updating.

Collectives™ on Stack Overflow

How to create a pandas dataframe from a nested json file?

5 Answers 5

1 Comment

1 Comment

1 Comment

Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

1 Comment

1 Comment

1 Comment

Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related