51

I need to format the contents of a Json file in a certain format in a pandas DataFrame so that I can run pandassql to transform the data and run it through a scoring model.

file = C:\scoring_model\json.js (contents of 'file' are below)

{
"response":{
  "version":"1.1",
  "token":"dsfgf",
   "body":{
     "customer":{
         "customer_id":"1234567",
         "verified":"true"
       },
     "contact":{
         "email":"[email protected]",
         "mobile_number":"0123456789"
      },
     "personal":{
         "gender": "m",
         "title":"Dr.",
         "last_name":"Muster",
         "first_name":"Max",
         "family_status":"single",
         "dob":"1985-12-23",
     }
   }
 }

I need the dataframe to look like this (obviously all values on same row, tried to format it best as possible for this question):

version | token | customer_id | verified | email      | mobile_number | gender |
1.1     | dsfgf | 1234567     | true     | [email protected] | 0123456789    | m      |

title | last_name | first_name |family_status | dob
Dr.   | Muster    | Max        | single       | 23.12.1985

I have looked at all the other questions on this topic, have tried various ways to load Json file into pandas

with open(r'C:\scoring_model\json.js', 'r') as f:
    c = pd.read_json(f.read())

with open(r'C:\scoring_model\json.js', 'r') as f:
    c = f.readlines()

tried pd.Panel() in this solution Python Pandas: How to split a sorted dictionary in a column of a dataframe with dataframe results from [yo = f.readlines()]. I thought about trying to split contents of each cell based on ("") and find a way to put the split contents into different columns but no luck so far.

3 Answers 3

70

If you load in the entire json as a dict (or list) e.g. using json.load, you can use json_normalize:

In [11]: d = {"response": {"body": {"contact": {"email": "[email protected]", "mobile_number": "0123456789"}, "personal": {"last_name": "Muster", "gender": "m", "first_name": "Max", "dob": "1985-12-23", "family_status": "single", "title": "Dr."}, "customer": {"verified": "true", "customer_id": "1234567"}}, "token": "dsfgf", "version": "1.1"}}

In [12]: df = pd.json_normalize(d)

In [13]: df.columns = df.columns.map(lambda x: x.split(".")[-1])

In [14]: df
Out[14]:
        email mobile_number customer_id verified         dob family_status first_name gender last_name title  token version
0  [email protected]    0123456789     1234567     true  1985-12-23        single        Max      m    Muster   Dr.  dsfgf     1.1
Sign up to request clarification or add additional context in comments.

Comments

2

It's much easier if you deserialize the JSON using the built-in json module first (instead of pd.read_json()) and then flatten it using pd.json_normalize().

# deserialize
with open(r'C:\scoring_model\json.js', 'r') as f:
    data = json.load(f)

# flatten
df = pd.json_normalize(d)

If a dictionary is passed to json_normalize(), it's flattened into a single row, but if a list is passed to it, it's flattened into multiple rows. So if the nested structure contains only key-value pairs, pd.json_normalize() with no parameters suffices to flatten it.


However, if the data contains a list (JSON array in the nesting in the file), then passing record_path= argument to let pandas find the path to the records. For example, if the data is like the following (notice how the value under "body" is a list, i.e. a list of records):

data = {
    "response":[
        {
            "version":"1.1",
            "customer": {"id": "1234567", "verified":"true"},
            "body":[
                {"email":"[email protected]", "mobile_number":"0123456789"},
                {"email":"[email protected]", "mobile_number":"9876543210"}
            ]
        }, 
        {
            "version":"1.2",
            "customer": {"id": "0987654", "verified":"true"},
            "body":[
                {"email":"[email protected]", "mobile_number":"9999999999"}
            ]
        }
    ]
}

then you can pass record_path= to let the program know that the records are under "body" and pass meta= to set the path to the metadata. Note how in "body", "version" and "customer" are in the same level in the data but "id" is nested one level more so you need to pass a list to get the value under "id".

df = pd.json_normalize(data['response'], record_path=['body'], meta=['version', ['customer', 'id']])

res

Comments

0

Your file = C:\scoring_model\json.js

Content of the file are:

{
"response":{
  "version":"1.1",
  "token":"dsfgf",
   "body":{
     "customer":{
         "customer_id":"1234567",
         "verified":"true"
       },
     "contact":{
         "email":"[email protected]",
         "mobile_number":"0123456789"
      },
     "personal":{
         "gender": "m",
         "title":"Dr.",
         "last_name":"Muster",
         "first_name":"Max",
         "family_status":"single",
         "dob":"1985-12-23"
     }
   }
 }
}

Code are below:

import json
import pandas as pd


with open(r'C:\scoring_model\json.js') as f:
    json_data=json.load(f)
    
    for key,val in json_data.items():
        
        dic={'version':val['version'],'token':val['token']
            ,'customer_id':val['body']['customer']['customer_id']
            ,'verified':val['body']['customer']['verified']
            ,'email':val['body']['contact']['email']
            ,'mobile_number':val['body']['contact']['mobile_number']
            ,'last_name':val['body']['personal']['last_name']
            ,'first_name':val['body']['personal']['first_name']
            ,'family_status': val['body']['personal']['family_status']
            ,'dob': val['body']['personal']['dob']}
        

df=pd.DataFrame(dic,index=[0,])
df

Output are:

version  token customer_id verified       email  mobile_number  last_name  first_name family_status         dob
 1.1  dsfgf     1234567     true  [email protected]    0123456789    Muster       Max        single  1985-12-23 

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.