0

I am trying to convert nested JSON into CSV using pandas. I have viewed similar questions asked here but I can't seem apply in on my scenario. My JSON is the following

{
 "51% FIFTY ONE PERCENT(PWD)" : {
 "ID" : "51%1574233975114-WEBAD",
 "contactName" : "",
 "createdAt" : 1574233975,
 "debit" : 118268.19999999995,
 "defaultCompany" : "",
 "emailAddress" : "",
 "lastUpdatedAt" : "",
 "phoneNumber" : "",
 "taskNumber" : 0
},
 "51% STORE (MUZ)" : {
 "ID" : "51%1576650784631-WEBAD",
 "contactName" : "",
 "createdAt" : 1576650784,
 "debit" : 63860,
 "defaultCompany" : "",
 "emailAddress" : "",
 "lastUpdatedAt" : "",
 "phoneNumber" : "",
 "taskNumber" : 0
},
 "ABBOTT S" : {
  "STORE (ABD)" : {
   "ID" : "ABB1574833257715-WEBAD",
   "contactName" : "",
   "createdAt" : 1574833257,
   "debit" : 35065,
   "defaultCompany" : "",
   "emailAddress" : "",
   "lastUpdatedAt" : "",
   "phoneNumber" : "",
   "taskNumber" : 0
 }
}
}

This is a snippet of the JSON and as you can see some entries, not all, are nested. I tried using the json_normalize the following way i.e.

import json
from pandas.io.json import json_normalize  

with open('.\Customers\kontrolkotlin-CUSTOMERS-export.json') as f:
d = json.load(f)

nycphil = json_normalize(data = d)
nycphil

And got a single row dataframe as output as shown below enter image description here This doesn't seem to work out as I want to something readable and understandable.

2 Answers 2

1

I'm sure there's a simpler say, but...

If you assume that the leafs of your nested JSON all have the same fields (ID, contactName, etc...), then you can recursively flatten your JSON and create a list of records, keeping the path that took you to the leaf.

Something like:

def flatten_json(x, path="", result=None):
    if result is None:
        result=[]
    if "ID" in x:
        result.append({**x, "path": path})
        return
    for key in x:
        flatten_json(x[key], path + "/" + key, result)
    return result

df = pd.DataFrame(flatten_json(data))
print(df)

result:

                       ID contactName   createdAt     debit defaultCompany  \
0  51%1574233975114-WEBAD              1574233975  118268.2                  
1  51%1576650784631-WEBAD              1576650784   63860.0                  
2  ABB1574833257715-WEBAD              1574833257   35065.0                  

  emailAddress lastUpdatedAt phoneNumber  taskNumber  \
0                                                  0   
1                                                  0   
2                                                  0   

                          path  
0  /51% FIFTY ONE PERCENT(PWD)  
1             /51% STORE (MUZ)  
2        /ABBOTT S/STORE (ABD)  
Sign up to request clarification or add additional context in comments.

2 Comments

I can't understand the parse_json function you mentioned there
Sorry, @ZohairAbbasHadi that was the wrong function name. I fixed the code now.
0

You can take direct data like:

nycphil = json_normalize(d['51% STORE (MUZ)'])
nycphil.head(3)
print(nycphil.head(3))

enter image description here

Or try to do something like this

df = read_json('some.json')
df.to_csv() 
print(df)

output enter image description here

1 Comment

Bu there is a problem with your answer because the last row is showing the nested json as it is

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.