1

My API gives me a json file as output with the following structure:

{

"results": [

    {

        "statement_id": 0,

        "series": [

            {

                "name": "PCJeremy",

                "tags": {

                    "host": "001"

                },

                "columns": [

                    "time",

                    "memory"

                ],

                "values": [

                    [

                        "2021-03-20T23:00:00Z",

                        1049911288

                    ],

                    [

                        "2021-03-21T00:00:00Z",

                        1057692712

                    ],
    ]

            },

            {

                "name": "PCJohnny",

                "tags": {

                    "host": "002"

                },

                "columns": [

                    "time",

                    "memory"

                ],

                "values": [

                    [

                        "2021-03-20T23:00:00Z",

                        407896064

                    ],

                    [

                        "2021-03-21T00:00:00Z",

                        406847488

                    ]


                ]

            }

        ]

    }

  ]
}

I want to transform this output to a pandas dataframe so I can create some reports from it. I tried using the pdDataFrame.from_dict method:

with open(fn) as f:
   data = json.load(f)
print(pd.DataFrame.from_dict(data))

But as a resulting set, I just get one column and one row with all the data back:

results 0 {'statement_id': 0, 'series': [{'name': 'Jerem...

The structure is just quite hard to understand for me as I am no professional. I would like to get a dataframe with 4 columns: name, host, time and memory with a row of data for every combination of values in the json file. Example:

name     host        time                memory
JeremyPC  001  "2021-03-20T23:00:00Z"  1049911288
JeremyPC  001  "2021-03-21T00:00:00Z"  1049911288

Is this in any way possible? Thanks a lot in advance!

2 Answers 2

1

First extract the data from json you are interested in

extracted_data = []

for series in data['results'][0]['series']:
    d = {}
    d['name'] = series['name']
    d['host'] = series['tags']['host']
    d['time'] = [value[0] for value in series['values']]
    d['memory'] = [value[1] for value in series['values']]

    extracted_data.append(d)

df = pd.DataFrame(extracted_data)
# print(df)

       name host                                          time                    memory
0  PCJeremy  001  [2021-03-20T23:00:00Z, 2021-03-21T00:00:00Z]  [1049911288, 1057692712]
1  PCJohnny  002  [2021-03-20T23:00:00Z, 2021-03-21T00:00:00Z]    [407896064, 406847488]

Second, explode multiple columns into rows

df1 = pd.concat([df.explode('time')['time'], df.explode('memory')['memory']], axis=1)

df_ = df.drop(['time','memory'], axis=1).join(df1).reset_index(drop=True)
# print(df_)

       name host                  time      memory
0  PCJeremy  001  2021-03-20T23:00:00Z  1049911288
1  PCJeremy  001  2021-03-21T00:00:00Z  1057692712
2  PCJohnny  002  2021-03-20T23:00:00Z   407896064
3  PCJohnny  002  2021-03-21T00:00:00Z   406847488

With carefully constructing the dict, it could be done without exploding.

extracted_data = []

for series in data['results'][0]['series']:
    d = {}
    d['name'] = series['name']
    d['host'] = series['tags']['host']

    for values in series['values']:
        d_ = d.copy()
        for column, value in zip(series['columns'], values):
            d_[column] = value

        extracted_data.append(d_)

df = pd.DataFrame(extracted_data)
Sign up to request clarification or add additional context in comments.

Comments

0

You could jmespath to extract the data; it is quite a handy tool for such nested json data. You can read the docs for more details; I will summarize the basics: If you want to access a key, use a dot, if you want to access values in a list, use []. Combination of these two will help in traversing the json paths. There are more tools; these basics should get you started.

Your json is wrapped in a data variable:

data
 
{'results': [{'statement_id': 0,
   'series': [{'name': 'PCJeremy',
     'tags': {'host': '001'},
     'columns': ['time', 'memory'],
     'values': [['2021-03-20T23:00:00Z', 1049911288],
      ['2021-03-21T00:00:00Z', 1057692712]]},
    {'name': 'PCJohnny',
     'tags': {'host': '002'},
     'columns': ['time', 'memory'],
     'values': [['2021-03-20T23:00:00Z', 407896064],
      ['2021-03-21T00:00:00Z', 406847488]]}]}]}

Let's create an expression to parse the json, and get the specific values:

expression = """{name: results[].series[].name, 
                 host: results[].series[].tags.host, 
                 time: results[].series[].values[*][0], 
                 memory: results[].series[].values[*][-1]}
             """

Parse the expression to the json data:

expression = jmespath.compile(expression).search(data)

expression
{'name': ['PCJeremy', 'PCJohnny'],
 'host': ['001', '002'],
 'time': [['2021-03-20T23:00:00Z', '2021-03-21T00:00:00Z'],
  ['2021-03-20T23:00:00Z', '2021-03-21T00:00:00Z']],
 'memory': [[1049911288, 1057692712], [407896064, 406847488]]}

Note the time and memory are nested lists, and match the values in data:

Create dataframe and explode relevant columns:

pd.DataFrame(expression).apply(pd.Series.explode)

       name host                  time      memory
0  PCJeremy  001  2021-03-20T23:00:00Z  1049911288
0  PCJeremy  001  2021-03-21T00:00:00Z  1057692712
1  PCJohnny  002  2021-03-20T23:00:00Z   407896064
1  PCJohnny  002  2021-03-21T00:00:00Z   406847488

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.