Nested json containing list of json-strings to dataframe

Question

I have the following json string that I would like to get into a dataframe:

jsonstr = {
  "id": "12345",
  "ename": "A4.txt",
  "Zoom1": {
    "Zoom1_res": [
      {
        "code": "A1",
        "x": 3211,
        "y": 677,
        "part": "11",
        "lace": "29",
        "name": "COVER"
      },
      {
        "code": "A4",
        "x": 3492,
        "y": 1109,
        "part": "10",
        "lace": "19",
        "name": "ARMOUR"
      }
    ]
  },
  "iSize": {
    "width": 4608,
    "height": 3456
  },
  "Action": {
    "AA": {
      "detect": [
        {
          "class": "aa",
          "prob": 0.92,
          "Box": {
            "x0": 4406,
            "y0": 670,
            "x1": 4558,
            "y1": 760
          }
        },
        {
          "class": "aa",
          "prob": 0.92,
          "Box": {
            "x0": 3762,
            "y0": 655,
            "x1": 3913,
            "y1": 747
          }
        }
      ]
    }
  }
}

Using json_read in the following way:

df =pd.read_json(jsonstr)

returns

 id   ename                                              Zoom1  \
Zoom1_res  12345  A4.txt  [{'code': 'A1', 'x': 3211, 'y': 677, 'part': '...   
width      12345  A4.txt                                                NaN   
height     12345  A4.txt                                                NaN   
AA         12345  A4.txt                                                NaN   

            iSize                                             Action  
Zoom1_res     NaN                                                NaN  
width      4608.0                                                NaN  
height     3456.0                                                NaN  
AA            NaN  {'detect': [{'class': 'aa', 'prob': 0.92, 'Box...

and

pd.json_normalize(df['Action'])

returns the error

AttributeError: 'float' object has no attribute 'values'

So, I thought that applying

from ast import literal_eval

pd.json_normalize(df['Action'].apply(lambda x: literal_eval(x)["detect"]).explode())

might solve the problem, but there are nan in that column, so even this doesn't work.

What I actually want is:

In the best of all worlds: id, ename, code, x, y, x0, y0, x1, y1

All other data is of no worth to me.

Grateful for any insight!

Pawan Jain · Accepted Answer · 2021-05-19 07:54:23Z

1

See your JSON is nested on multiple levels,

1. Creting sub dataframes

df1 = pd.json_normalize(jsonstr, record_path=['Action','AA','detect'],  meta=['id','ename'])
df2 = pd.json_normalize(jsonstr, record_path=['Zoom1','Zoom1_res'],  meta=['id','ename'])

2. Shifting Data to NaN

As per my understanding, the gBOX and BOX are the same attributes, so you can merge them in this way, well you can play around with these and get the required data

df3 = df1.apply(lambda x: pd.Series(x.dropna().values), axis=1)
df3.columns = ['class','prob','x0','y0','x1','y1','id','ename']

3. Get required columns as per your data

df4 = pd.merge(df3, df2, on=['id','ename'])
df4 = df4.iloc[:,[6,7,8,9,10,2,3,4,5]]

edited May 19, 2021 at 7:54

answered May 18, 2021 at 12:30

Pawan Jain

8266 silver badges15 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Serge de Gosson de Varennes Over a year ago

This is really really nice! One comment though: , left_index=True, right_index=True doesn't work for me. Thank you!

Collectives™ on Stack Overflow

Nested json containing list of json-strings to dataframe

1 Answer 1

1. Creting sub dataframes

2. Shifting Data to NaN

3. Get required columns as per your data

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1. Creting sub dataframes

2. Shifting Data to NaN

3. Get required columns as per your data

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related