Nested json object column into dataframe

Question

I have a dataframe(df1) containing two columns.

id          information 
00100       {'DriversList': {'ProblematicDrivers': [], 'In...   
00200       {'DriversList': {'ProblematicDrivers': [], 'In...

The information column contains nested json object, which needs to be converted into dataFrame, and associate the same with ID.

df1['information'] column's json --

'DriversList': {
  'ProblematicDrivers': [
  ],
  'InstalledDrivers': [
    {
      'DriverName': 'FaxMachine',
      'DisplayName': 'Fax',
      'Version': '10',
      'Date': '06-21-2006'
    },
    {
      'DriverName': 'FaxMachine',
      'DisplayName': 'Fax',
      'Version': '10',
      'Date': '06-21-2006'
    }
  ]
}
}

My code so far:

df2 = pd.DataFRame()
data = json_normalize(data = df1['information'])
for x in data['DriversList.InstalledDrivers']:
    df2 = df2.append(x)

The number of records in information column will be associated with the ID, which is present in original dataframe(df1)

For example -- For first row, as information column contains 2 records for InstalledDrivers, the final output will have 00100 associated with 2 rows.

Expected OutPut --

id      Date        DriverName  DisplayName   Version
00100   06-21-2006  FaxMachine  Fax           10
00100   06-21-2006  FaxMachine  Fax           10
00200   06-21-2006  FaxMachine  Fax           10
00200   06-21-2006  FaxMachine  Fax           10

Any suitable approach which can be handle on dataFrame level only. I've also tried JSON_Normalize but unable to load this JSON into dataframe. Is it possible to do it using JSON Normalize or is there any other optimized solution available. And also not able to associate id with the converted dataframe.

do u mind sharing the original dataframe in a dict form, to include the ids, so that a solution can be proferred that includes both columns — sammywemmy
– sammywemmy, Commented Apr 12, 2020 at 22:22
Have shared the original dataframe(df1) only at the start. Just that the data of information column is the same in both the rows — spontaneous_coder
– spontaneous_coder, Commented Apr 12, 2020 at 22:43

Dani Mesejo · Accepted Answer · 2020-04-12 22:37:06Z

2

IIUC, this is a possible approach:

import json
import pandas as pd

# setup
d = """{"DriversList": {
    "ProblematicDrivers": [],
    "InstalledDrivers": [
        {"DriverName": "FaxMachine", "DisplayName": "Fax", "Version": "10", "Date": "06-21-2006"},
        {"DriverName": "FaxMachine", "DisplayName": "Fax", "Version": "10", "Date": "06-21-2006"}
    ]}
}"""
df = pd.DataFrame(data=[d], columns=["information"])

# extract data
data = [drivers for info in df["information"].values for drivers in json.loads(info)["DriversList"]["InstalledDrivers"]]

# create DataFrame
result = pd.DataFrame.from_records(data)

print(result)

Output

   DriverName DisplayName Version        Date
0  FaxMachine         Fax      10  06-21-2006
1  FaxMachine         Fax      10  06-21-2006

Update

You can associate each id with the drivers, by doing the following:

df = pd.DataFrame(data=[['00100', d]], columns=["id", "information"])

# extract data
data = [{"id": i, **drivers} for i, info in df[["id", "information"]].values for drivers in json.loads(info)["DriversList"]["InstalledDrivers"]]

# create DataFrame
result = pd.DataFrame.from_records(data)

print(result)

The above code adds an id entry to the record.

edited Apr 12, 2020 at 22:37

answered Apr 12, 2020 at 21:25

Dani Mesejo

62.2k6 gold badges56 silver badges86 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

spontaneous_coder Over a year ago

Hey Thanks, It worked for the column, but still curious, that how the original dataframe's id will be associated with each row.

Dani Mesejo Over a year ago

@user2597209 So in your output, more than one driver can have the same id, right? For example in your data the two installed drivers, will have id 00100?

spontaneous_coder Over a year ago

Yes.. correct. I missed the 2nd row json data in the shared snippet.

Collectives™ on Stack Overflow

Nested json object column into dataframe

1 Answer 1

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related