Nested Json to pandas DataFrame with specific format

Question

I need to format the contents of a Json file in a certain format in a pandas DataFrame so that I can run pandassql to transform the data and run it through a scoring model.

file = C:\scoring_model\json.js (contents of 'file' are below)

{
"response":{
  "version":"1.1",
  "token":"dsfgf",
   "body":{
     "customer":{
         "customer_id":"1234567",
         "verified":"true"
       },
     "contact":{
         "email":"[email protected]",
         "mobile_number":"0123456789"
      },
     "personal":{
         "gender": "m",
         "title":"Dr.",
         "last_name":"Muster",
         "first_name":"Max",
         "family_status":"single",
         "dob":"1985-12-23",
     }
   }
 }

I need the dataframe to look like this (obviously all values on same row, tried to format it best as possible for this question):

version | token | customer_id | verified | email      | mobile_number | gender |
1.1     | dsfgf | 1234567     | true     | [email protected] | 0123456789    | m      |

title | last_name | first_name |family_status | dob
Dr.   | Muster    | Max        | single       | 23.12.1985

I have looked at all the other questions on this topic, have tried various ways to load Json file into pandas

with open(r'C:\scoring_model\json.js', 'r') as f:
    c = pd.read_json(f.read())

with open(r'C:\scoring_model\json.js', 'r') as f:
    c = f.readlines()

tried pd.Panel() in this solution Python Pandas: How to split a sorted dictionary in a column of a dataframe with dataframe results from [yo = f.readlines()]. I thought about trying to split contents of each cell based on ("") and find a way to put the split contents into different columns but no luck so far.

Curious Watcher · Accepted Answer · 2020-06-08 20:04:15Z

70

If you load in the entire json as a dict (or list) e.g. using json.load, you can use json_normalize:

In [11]: d = {"response": {"body": {"contact": {"email": "[email protected]", "mobile_number": "0123456789"}, "personal": {"last_name": "Muster", "gender": "m", "first_name": "Max", "dob": "1985-12-23", "family_status": "single", "title": "Dr."}, "customer": {"verified": "true", "customer_id": "1234567"}}, "token": "dsfgf", "version": "1.1"}}

In [12]: df = pd.json_normalize(d)

In [13]: df.columns = df.columns.map(lambda x: x.split(".")[-1])

In [14]: df
Out[14]:
        email mobile_number customer_id verified         dob family_status first_name gender last_name title  token version
0  [email protected]    0123456789     1234567     true  1985-12-23        single        Max      m    Muster   Dr.  dsfgf     1.1

edited Jun 8, 2020 at 20:04

Curious Watcher

6899 silver badges13 bronze badges

answered Dec 17, 2015 at 19:15

Andy Hayden

378k110 gold badges640 silver badges546 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

cottontail · Accepted Answer · 2023-02-05 18:44:49Z

It's much easier if you deserialize the JSON using the built-in json module first (instead of pd.read_json()) and then flatten it using pd.json_normalize().

# deserialize
with open(r'C:\scoring_model\json.js', 'r') as f:
    data = json.load(f)

# flatten
df = pd.json_normalize(d)

If a dictionary is passed to json_normalize(), it's flattened into a single row, but if a list is passed to it, it's flattened into multiple rows. So if the nested structure contains only key-value pairs, pd.json_normalize() with no parameters suffices to flatten it.

However, if the data contains a list (JSON array in the nesting in the file), then passing record_path= argument to let pandas find the path to the records. For example, if the data is like the following (notice how the value under "body" is a list, i.e. a list of records):

data = {
    "response":[
        {
            "version":"1.1",
            "customer": {"id": "1234567", "verified":"true"},
            "body":[
                {"email":"[email protected]", "mobile_number":"0123456789"},
                {"email":"[email protected]", "mobile_number":"9876543210"}
            ]
        }, 
        {
            "version":"1.2",
            "customer": {"id": "0987654", "verified":"true"},
            "body":[
                {"email":"[email protected]", "mobile_number":"9999999999"}
            ]
        }
    ]
}

then you can pass record_path= to let the program know that the records are under "body" and pass meta= to set the path to the metadata. Note how in "body", "version" and "customer" are in the same level in the data but "id" is nested one level more so you need to pass a list to get the value under "id".

df = pd.json_normalize(data['response'], record_path=['body'], meta=['version', ['customer', 'id']])

Shahidul Islam Molla · Accepted Answer · 2024-10-17 15:51:29Z

Your file = C:\scoring_model\json.js

Content of the file are:

{
"response":{
  "version":"1.1",
  "token":"dsfgf",
   "body":{
     "customer":{
         "customer_id":"1234567",
         "verified":"true"
       },
     "contact":{
         "email":"[email protected]",
         "mobile_number":"0123456789"
      },
     "personal":{
         "gender": "m",
         "title":"Dr.",
         "last_name":"Muster",
         "first_name":"Max",
         "family_status":"single",
         "dob":"1985-12-23"
     }
   }
 }
}

Code are below:

import json
import pandas as pd


with open(r'C:\scoring_model\json.js') as f:
    json_data=json.load(f)
    
    for key,val in json_data.items():
        
        dic={'version':val['version'],'token':val['token']
            ,'customer_id':val['body']['customer']['customer_id']
            ,'verified':val['body']['customer']['verified']
            ,'email':val['body']['contact']['email']
            ,'mobile_number':val['body']['contact']['mobile_number']
            ,'last_name':val['body']['personal']['last_name']
            ,'first_name':val['body']['personal']['first_name']
            ,'family_status': val['body']['personal']['family_status']
            ,'dob': val['body']['personal']['dob']}
        

df=pd.DataFrame(dic,index=[0,])
df

Output are:

version  token customer_id verified       email  mobile_number  last_name  first_name family_status         dob
 1.1  dsfgf     1234567     true  [email protected]    0123456789    Muster       Max        single  1985-12-23

Collectives™ on Stack Overflow

Nested Json to pandas DataFrame with specific format

3 Answers 3

Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related