3

I have a json file which looks like this:

{
    "data": {
        "success": true,
        "timeseries": true,
        "start_date": "2022-10-01",
        "end_date": "2022-10-04",
        "base": "EUR",
        "rates": {
            "2022-10-01": {
                "NG": 0.1448939471560284
            },
            "2022-10-02": {
                "NG": 0.14487923291390148
            },
            "2022-10-03": {
                "NG": 0.1454857922753868
            },
            "2022-10-04": {
                "NG": 0.1507352356663182
            }
        },
        "unit": "per MMBtu"
    }
}

I want to create a dataframe which looks like this:

Date        NG        base 
2022-10-01  0.144894  EUR
2022-10-02  0.144879  EUR
2022-10-03  0.145486  EUR
2022-10-04  0.150735  EUR

This is what I tried:

with open(r'C:\Users\EH\Desktop\tools\json_files\blue_file.json','r') as f:
    data = json.loads(f.read())

df1 = pd.DataFrame(data['data']['rates'])
df1 = df1.T
df2 = pd.DataFrame(data['data'])
df2 = df2.base

merge = [df1, df2]
df3 = pd.concat(merge)

print(df3)

My current output:

                  NG    0
2022-10-01  0.144894  NaN
2022-10-02  0.144879  NaN
2022-10-03  0.145486  NaN
2022-10-04  0.150735  NaN
2022-10-01       NaN  EUR
2022-10-02       NaN  EUR
2022-10-03       NaN  EUR
2022-10-04       NaN  EUR

As you can see something is not going okay, I dont understand where the NaNs come from.

5 Answers 5

1

Assuming that the file is stored in a variable called data

data = { "data": { "success": True, "timeseries": True, "start_date": "2022-10-01", "end_date": "2022-10-04", "base": "EUR", "rates": { "2022-10-01": { "NG": 0.1448939471560284 }, "2022-10-02": { "NG": 0.14487923291390148 }, "2022-10-03": { "NG": 0.1454857922753868 }, "2022-10-04": { "NG": 0.1507352356663182 } }, "unit": "per MMBtu" } }

First, one will read the data to a dataframe with pandas.DataFrame.from_dict as follows

df = pd.DataFrame.from_dict(data['data']['rates'], orient='index')

[Out]:
                  NG
2022-10-01  0.144894
2022-10-02  0.144879
2022-10-03  0.145486
2022-10-04  0.150735

But, as one wants doesn't want the columns with date to be the index, and one wants that column to have the name Date, one will start by resetting the index using pandas.DataFrame.reset_index

df = df.reset_index()

[Out]:
        index        NG
0  2022-10-01  0.144894
1  2022-10-02  0.144879
2  2022-10-03  0.145486
3  2022-10-04  0.150735

And then one will be able to rename the columns to the desired output format pandas.DataFrame.rename

df = df.rename(columns={'index':'Date'})

[Out]:

         Date        NG
0  2022-10-01  0.144894
1  2022-10-02  0.144879
2  2022-10-03  0.145486
3  2022-10-04  0.150735

As it is still missing the base column, one can pick it up from the data as follows

df['base'] = data['data']['base']

[Out]:

         Date        NG base
0  2022-10-01  0.144894  EUR
1  2022-10-02  0.144879  EUR
2  2022-10-03  0.145486  EUR
3  2022-10-04  0.150735  EUR

Even though one already has the desired output, one can wrap everything into a one-liner as follows

df = pd.DataFrame.from_dict(data['data']['rates'], orient='index').reset_index().rename(columns={'index':'Date'}).assign(base=data['data']['base'])

[Out]:
         Date        NG base
0  2022-10-01  0.144894  EUR
1  2022-10-02  0.144879  EUR
2  2022-10-03  0.145486  EUR
3  2022-10-04  0.150735  EUR
Sign up to request clarification or add additional context in comments.

1 Comment

Plus one for detailed explanation.
1

Using what you had done, you just needed to specify the axis on which you want to concatenate the 2 dataframes by default pandas concatenated one below the other.

pd.concat(merge, axis=1) # default 0

1 Comment

Plus one for answering the question.
0

JSON is a very flexible format and no function in pandas can parse all types of JSON. You need to preprocess the JSON before constructing the dataframe:

with open("data.json") as fp:
    data = json.load(fp)

df = pd.DataFrame(
    [(date, rate["NG"]) for date, rate in data["data"]["rates"].items()],
    columns=["Date", "NG"],
).assign(base=data["data"]["base"])

1 Comment

That is some very long code of line. How do you even come up with it? But it works!
0

You were almost there. The reason you are getting NaNs is that you need to specify different axis when concatenating the dataframes: pd.concat(merge, axis=1).

Here is my version:

import json
import pandas as pd

with open("blue_file.json", "r") as f:
    data = json.loads(f.read())

df = pd.DataFrame.from_dict(data["data"]["rates"], orient="index")
df["base"] = data["data"]["base"]
df

Output:

                  NG base
2022-10-01  0.144894  EUR
2022-10-02  0.144879  EUR
2022-10-03  0.145486  EUR
2022-10-04  0.150735  EUR

Comments

0

simply:

import json
with open(r'path.json','r') as f:
    data = json.loads(f.read())

df=pd.DataFrame(data['data']).reset_index()
df=df.join(pd.json_normalize(df.pop('rates')))
print(df)

df=df[['index','NG','base']]
df=df.rename(columns={'index':'Date'})
'''
    index      success  timeseries  start_date  end_date    base    unit    NG
0   2022-10-01  true    true    2022-10-01  2022-10-04  EUR per MMBtu   0.1448939471560284
1   2022-10-02  true    true    2022-10-01  2022-10-04  EUR per MMBtu   0.14487923291390148
2   2022-10-03  true    true    2022-10-01  2022-10-04  EUR per MMBtu   0.1454857922753868
3   2022-10-04  true    true    2022-10-01  2022-10-04  EUR per MMBtu   0.1507352356663182

'''

2 Comments

Not going to downvote, but the output doesn't look quite the same as what the OP is asking.
I thought the person asking the question could filter the columns. But you're right, I'm updating.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.