1

I have the following json string that I would like to get into a dataframe:

jsonstr = {
  "id": "12345",
  "ename": "A4.txt",
  "Zoom1": {
    "Zoom1_res": [
      {
        "code": "A1",
        "x": 3211,
        "y": 677,
        "part": "11",
        "lace": "29",
        "name": "COVER"
      },
      {
        "code": "A4",
        "x": 3492,
        "y": 1109,
        "part": "10",
        "lace": "19",
        "name": "ARMOUR"
      }
    ]
  },
  "iSize": {
    "width": 4608,
    "height": 3456
  },
  "Action": {
    "AA": {
      "detect": [
        {
          "class": "aa",
          "prob": 0.92,
          "Box": {
            "x0": 4406,
            "y0": 670,
            "x1": 4558,
            "y1": 760
          }
        },
        {
          "class": "aa",
          "prob": 0.92,
          "Box": {
            "x0": 3762,
            "y0": 655,
            "x1": 3913,
            "y1": 747
          }
        }
      ]
    }
  }
}

Using json_read in the following way:

df =pd.read_json(jsonstr)

returns

 id   ename                                              Zoom1  \
Zoom1_res  12345  A4.txt  [{'code': 'A1', 'x': 3211, 'y': 677, 'part': '...   
width      12345  A4.txt                                                NaN   
height     12345  A4.txt                                                NaN   
AA         12345  A4.txt                                                NaN   

            iSize                                             Action  
Zoom1_res     NaN                                                NaN  
width      4608.0                                                NaN  
height     3456.0                                                NaN  
AA            NaN  {'detect': [{'class': 'aa', 'prob': 0.92, 'Box...  

and

pd.json_normalize(df['Action'])

returns the error

AttributeError: 'float' object has no attribute 'values'

So, I thought that applying

from ast import literal_eval

pd.json_normalize(df['Action'].apply(lambda x: literal_eval(x)["detect"]).explode())

might solve the problem, but there are nan in that column, so even this doesn't work.

What I actually want is:

In the best of all worlds: id, ename, code, x, y, x0, y0, x1, y1

All other data is of no worth to me.

Grateful for any insight!

1 Answer 1

1

See your JSON is nested on multiple levels,

1. Creting sub dataframes

df1 = pd.json_normalize(jsonstr, record_path=['Action','AA','detect'],  meta=['id','ename'])
df2 = pd.json_normalize(jsonstr, record_path=['Zoom1','Zoom1_res'],  meta=['id','ename'])

2. Shifting Data to NaN

As per my understanding, the gBOX and BOX are the same attributes, so you can merge them in this way, well you can play around with these and get the required data
df3 = df1.apply(lambda x: pd.Series(x.dropna().values), axis=1)
df3.columns = ['class','prob','x0','y0','x1','y1','id','ename']

3. Get required columns as per your data

df4 = pd.merge(df3, df2, on=['id','ename'])
df4 = df4.iloc[:,[6,7,8,9,10,2,3,4,5]]
Sign up to request clarification or add additional context in comments.

1 Comment

This is really really nice! One comment though: , left_index=True, right_index=True doesn't work for me. Thank you!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.