Missing json information using python

Question

I am struggling to get some json information to appear in a dataframe column

The information is 'data_at': 1619293080600

Here is what I have so far:

requestT = requests.get('https:............)
json_dataT = json.loads(requestT.text)

print(json_dataT)

Output:

{'data_at': 1619293080600, 'data': {'london_NW': {'loc_postcode': 'NW1', 'loc_name': 'camden_twn', 'ave_price': '1061227.00'}, 'london_SW': {'loc_postcode': 'SW1', 'loc_name': 'victoria', 'ave_price': '1878130.00'}}}

I then transform this into a dataframe via the following method:

df = pd.DataFrame(json_dataT)
dfNormal = json_normalize(df['data'])

However, I lose the 'data_at' information which is a timestamp that I want in column 0. What I get is the following:

        loc_postcode          loc_name              ave_price
0                NW1        camden_twn             1061227.00
1                SW1          victoria             1878130.00

How can I get the 'data_at' (timestamp) to appear as the first column?

Nikolaos Chatzis · Accepted Answer · 2021-04-24 22:24:36Z

1

One way to get the desired result is to "normalize" json_dataT, as json_dataT['data'].keys() is not present in the desired result.

Specifically, "drop" the level w/ json_dataT['data'].keys():

>>> json_dataT['data'] = list(json_dataT['data'].values())

Then, apply json_normalize to get the dataframe:

>>> df_normal = json_normalize(json_dataT, record_path='data', meta='data_at')
>>> df_normal
  loc_postcode    loc_name   ave_price        data_at
0          NW1  camden_twn  1061227.00  1619293080600
1          SW1    victoria  1878130.00  1619293080600

Finally, reorder the columns to make data_at the first column:

>>> cols = df_normal.columns.tolist()
>>> cols = cols[-1:] + cols[:-1]
>>> df_normal = df_normal[cols]
>>> df_normal
         data_at loc_postcode    loc_name   ave_price
0  1619293080600          NW1  camden_twn  1061227.00
1  1619293080600          SW1    victoria  1878130.00

answered Apr 24, 2021 at 22:24

Nikolaos Chatzis

1,9872 gold badges13 silver badges19 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

windwalker Over a year ago

Just saw this a few minutes ago. It worked like a charm. Thanks buddy

mechanical_meat · Accepted Answer · 2021-04-24 20:55:13Z

0

Because of the way df is constructed, the index will not match with the result of your .json_normalize() operation -- it will be: Index(['london_NW', 'london_SW'], dtype='object').
To get around this you can use .reset_index(), and then horizontally concatenate with pd.concat() with axis=1:

df.reset_index(drop=True,inplace=True)
df_normal = pd.concat([df['data_at'],pd.json_normalize(df['data'])],axis=1)

Result:

In [63]: df_normal
Out[63]: 
         data_at loc_postcode    loc_name   ave_price
0  1619293080600          NW1  camden_twn  1061227.00
1  1619293080600          SW1    victoria  1878130.00

answered Apr 24, 2021 at 20:55

mechanical_meat

170k25 gold badges237 silver badges231 bronze badges

1 Comment

windwalker Over a year ago

I get an error of TypeError: 'NoneType' object is not subscriptable

Collectives™ on Stack Overflow

Missing json information using python

2 Answers 2

1 Comment

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related