1

I have the following json:

{
    "meta": {
        "collection": 0
    },
    "series": {
        "default": {
            "data": {
                "columns": [
                    {
                        "columnName": "A",
                        "columnType": "STRING"
                    },
                    {
                        "columnName": "B",
                        "columnType": "STRING"
                    }
                ],
                "rows": [
                    [
                        {
                            "columnName": "A",
                            "value": "X1"
                        },
                        {
                            "columnName": "B",
                            "value": "Y1"
                        }
                    ],
                    [
                        {
                            "columnName": "A",
                            "value": "X2"
                        },
                        {
                            "columnName": "B",
                            "value": "Y2"
                        }
                    ]
                ]
            }
        }
    }
}

I'm trying to parse this json into a dataframe which should look like this:

    A    B
   --------
    X1   Y1
    X2   Y2

Here's what I have tried so far:

import pandas as pd
results = {"meta":{"collection":0},"series":{"default":{"data":{"columns":[{"columnName":"A","columnType":"STRING"},{"columnName":"B","columnType":"STRING"}],"rows":[[{"columnName":"A","value":"X1"},{"columnName":"B","value":"Y1"}],[{"columnName":"A","value":"X2"},{"columnName":"B","value":"Y2"}]]}}}}
s = results["series"]["default"]["data"]
df = pd.json_normalize(s, record_path=["rows"])

The problem is the columns in the df contain the json array, and not the values

                                    0                                   1
0  {'columnName': 'A', 'value': 'X1'}  {'columnName': 'B', 'value': 'Y1'}
1  {'columnName': 'A', 'value': 'X2'}  {'columnName': 'B', 'value': 'Y2'}

Is there anyway to use json_normalize to achieve the result I want, or is parsing through json myself the way to go?

1 Answer 1

2

One option is to stack it and create a DataFrame with the stacked Series and pivot it:

out = (pd.DataFrame(my_data['series']['default']['data']['rows'])
       .stack()
       .pipe(lambda x: pd.DataFrame(x.tolist(), index=x.index))
       .reset_index(level=0)
       .pivot('level_0','columnName','value')
       .rename_axis(index=[None], columns=[None]))

An easier (and probably more efficient) solution is to reformat the data in Python using dict.setdefault in a nested loop; then build the DataFrame:

tmp = {}
for lst in my_data['series']['default']['data']['rows']:
    for d in lst:
        tmp.setdefault(d['columnName'], []).append(d['value'])
out = pd.DataFrame(tmp)

Output:

    A   B
0  X1  Y1
1  X2  Y2
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.