2

The problem here is different than those defined in the questions here, here, and here. Specifically, the transformation and form of the output I want is different than any specified in those questions, and I also want a DateTime index. These differences cause the answers on those pages fail.

I have data formatted as a json like this:

{
    "Meta Data": {
        "1. Information": "Daily Prices (open, high, low, close) and Volumes",
        "2. Symbol": "ABC",
    },
    "Time Series (Daily)": {
        "2001-06-31": {
            "1. open": "113.2000",
            "4. close": "113.8000",
        },
        "2001-07-01": {
            "1. open": "114.2000",
            "4. close": "114.2000",
        }
    }
}

I want the output to look be a pandas dataframe like this:

"Time Series (Daily)" | "1. open" | "4. close"
"2001-06-31"          | 113.2000  | 113.8000
"2001-07-01"          | 114.2000  | 114.2000

I wrote a function that works, but the for loop leaves performance wanting and I find it hard to read.

def convert_json_to_dataframe(all_json_data):
    json_data = all_json_data["Time Series (Daily)"]
    dates = []
    open = []
    close = []
    for key in json_data.keys():
        dates.append(key)
        open.append(json_data[key]["1. open"])
        close.append(json_data[key]["4. close"])
    df = pd.DataFrame(
        list(zip(open, close)),
        columns=["1. open", "4. close"],
        index=dates,
    )
    df = df.apply(pd.to_numeric, errors="ignore")
    return df

There's got to be a simpler, easier to read, higher-performing way to do this, maybe with json_normalize in pandas, but I can't figure it out.


UPDATE AFTER ANSWERS & RESOLUTION. All I had to do was:

df = pd.DataFrame(json_data["Time Series (Daily)"]).T

Pandas discovered the index and column names automatically, so I didn't need the reset_index portion of the answers.

The orient approach also worked:

df = pd.DataFrame.from_dict(json_data["Time Series (Daily)", orient="index")

To get all the numbers as floats instead of strings, I did need the apply line:

df = df.apply(pd.to_numeric, errors="coerce")

Thank you everyone.

1
  • Transposing a DataFrame is a very slow process. Constructing the DataFrame in the correct shape from the beginning using from_dict is more efficient. For example, for a DataFrame of shape (10000, 2), from_dict is ~4x faster than transposing. Commented May 11, 2022 at 5:26

3 Answers 3

1

Why don't you just do this?

pd.DataFrame(data["Time Series (Daily)"]).T.reset_index().rename(columns = {"index":"Time Series (Daily)"})

Output -

Time Series (Daily) 1. open 4. close
0 2001-06-31 113.2000 113.8000
1 2001-07-01 114.2000 114.2000
Sign up to request clarification or add additional context in comments.

Comments

1

It seems the relevant data is only under "Time Series (Daily)" key, so you could get that and construct a DataFrame (use the orient parameter to get it in the correct shape):

out = pd.DataFrame.from_dict(my_data['Time Series (Daily)'], orient='index')

Output:

            1. open  4. close
2001-06-31  113.2000  113.8000
2001-07-01  114.2000  114.2000

Comments

1

Taking json from an URL is the easiest way:

import requests
url='url from json file'
r=requests.get(url)
data=r.json()
df=pd.DataFrame(data['Time Series (Daily)']).T
df=df.reset_index('Time Series (Daily)')

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.