2

Although a few other examples of nested JSON to pandas dataframe can be found, this one I cannot find and hence not succeed. I have a nested JSON as follows:

{'x':
     {'1':[2,5,6],'2':[7,6]},
 'y':
     {'1':[0,4,8],'2':[8,1]},
 'z':
     {'1':[8,0,9],'2':[2,2]}}

and I would like a dataframe as:

   1_0 1_1 1_2 2_0 2_1
x   2   5   6   7   6
y   0   4   8   8   1
z   8   0   9   2   2

the labelling of the columns do not necessarily have to be exactly this as long as I can infer the data correctly.

I have tried this:

import json
import pandas as pd
from pandas.io.json import json_normalize

with gzip.open('example.json') as f:    
    d = json.load(f)  

df = pd.json_normalize(d)
df  

resulting in this:

enter image description here

1 Answer 1

1

One way is to first use DataFrame.from_dict to load the values as list, and then concat them:

d = {'x':
     {'1':[2,5,6],'2':[7,6]},
 'y':
     {'1':[0,4,8],'2':[8,1]},
 'z':
     {'1':[8,0,9],'2':[2,2]}}

df = pd.DataFrame.from_dict(d, orient="index")
df2 = pd.concat([pd.DataFrame(df[i].values.tolist(),
                              columns=[f"{i}_{num}" for num in range(len(df[i].iat[0]))]
                              ) for i in df.columns],axis=1)

print (df2)

   1_0  1_1  1_2  2_0  2_1
0    2    5    6    7    6
1    0    4    8    8    1
2    8    0    9    2    2

Alternatively using chain.from_iterable to flatten the lists first:

from itertools import chain

print (pd.DataFrame([chain.from_iterable(i.values()) for i in d.values()],
                    index=d.keys(),
                    columns=[f"{k}_{num}" for k, v in list(d.values())[0].items()
                             for num in range(len(v))]))

   1_0  1_1  1_2  2_0  2_1
x    2    5    6    7    6
y    0    4    8    8    1
z    8    0    9    2    2
Sign up to request clarification or add additional context in comments.

8 Comments

Thank you for your answer, this certainly works, but I am looking for a highly optimised way because my real JSON file is > 200MB!
@pr94 sure. I added an alternative method for you to test.
@HenryYik - columns names are different
@jezrael Yes but he also said the labelling of the columns do not necessarily have to be exactly this so :)
@HenryYik yes, but if you have an easy way to implement it, it would be preferred. It is nice that you have the row labeling also in the second option!
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.