0

I have an complex/nested JSON, that i need to transform into DataFrame (Python). I could get the first part, but i'm struggling to solve the second part.

import requests
from pandas.io.json import json_normalize
import json

url = 'url'

headers = {'api-key':'key'}

resp = requests.get(url, headers = headers)
print(resp.status_code)

r = resp.content
r

responses = json.loads(r.decode('utf-8'))
responses

Output (responses)

{'count': 39,
 'requestAt': '2020-06-09T20:10:23.201+00:00',
 'data': {'Id1': {'id': 'Id1',
   'groupId': '1',
   'label': 'Question 1',
   'options': {'1_1': {'id': '1_1',
     'prefix': 'A',
     'label': 'Alternative A',
     'isCorrect': True},
    '1_2': {'id': '1_2',
     'prefix': 'B',
     'label': 'Alternative B',
     'isCorrect': False},
    '1_3': {'id': '1_3',
     'prefix': 'C',
     'label': 'Alternative C',
     'isCorrect': False}}}}}
df = DataFrame(responses['data'])
df.T

Output (DataFrame.T):

+-----+---------+------------+-------------+
| id  | groupId |   label    | options     |
+-----+---------+------------+-------------+
| Id1 |       1 | Question 1 | **JSON 2**  |
+-----+---------+------------+-------------+
 **JSON 2** (all inside the cell above)
{'1_1': {'id': '1_1',
     'prefix': 'A',
     'label': 'Alternative A',
     'isCorrect': True},
    '1_2': {'id': '1_2',
     'prefix': 'B',
     'label': 'Alternative B',
     'isCorrect': False},
    '1_3': {'id': '1_3',
     'prefix': 'C',
     'label': 'Alternative C',
     'isCorrect': False}}

I need to open the JSON 2 into DataFrame too.

Desired output:

+-----+---------+------------+--------+---------------+-----------+
| id  | groupId |   label    | prefix |     label     | isCorrect |
+-----+---------+------------+--------+---------------+-----------+
| Id1 |       1 | Question 1 | A      | Alternative A | True      |
| Id1 |       1 | Question 1 | B      | Alternative B | False     |
| Id1 |       1 | Question 1 | C      | Alternative C | False     |
+-----+---------+------------+--------+---------------+-----------+

How do i get the desired output? Thanks.

2
  • does not seem like valid json, can you add a few rows from response? Commented Jun 10, 2020 at 12:31
  • It's ok now? I had to cut the part from just one question to exemplify, i hope that it's ok now. Commented Jun 10, 2020 at 12:40

1 Answer 1

1

Here's a way to do this:

import pandas as pd 

responses = {
    'count': 39,
    'requestAt': '2020-06-09T20:10:23.201+00:00',
    'data': {
        'Id1': {
            'id': 'Id1',
            'groupId': '1',
            'label': 'Question 1',
            'options': {
                '1_1': {
                    'id': '1_1',
                    'prefix': 'A',
                    'label': 'Alternative A',
                    'isCorrect': True},
                '1_2': {
                    'id': '1_2',
                    'prefix': 'B',
                    'label': 'Alternative B',
                    'isCorrect': False},
                '1_3': {
                    'id': '1_3',
                    'prefix': 'C',
                    'label': 'Alternative C',
                    'isCorrect': False}
            }
        }
    }
}


# refactor response to a list of dicts
# where each item is a dictionary of keys and values 
# corresponding to a single row of dataframe
response_list = []

for id in responses['data']:

    # get the keys of interest
    data = {k: v for k, v in responses['data'][id].items() if k in ['id', 'groupId', 'label']}

    # lets rename 'label' key as deeper inside the json there's another key named 'label'
    # lets not have two columns named the same inside the dataframe
    data['label_'] = data.pop('label')

    # dig deeper inside the current id
    for key in responses['data'][id]['options']:

        # get the keys of interest
        inner_data = {k: v for k, v in responses['data'][id]['options'][key].items() if k in ['prefix', 'label', 'isCorrect']}

        # combine the two dicts and append it to the final list
        response_list.append({**data, **inner_data})

print(pd.DataFrame(response_list))

Here's the output:

    id groupId      label_ prefix          label  isCorrect
0  Id1       1  Question 1      A  Alternative A       True
1  Id1       1  Question 1      B  Alternative B      False
2  Id1       1  Question 1      C  Alternative C      False
Sign up to request clarification or add additional context in comments.

2 Comments

It's exacly that that i needed. I'm for about one week trying to do this, you cant imagine how much you helped me. Thanks!!!!!!!! I'm gonna study your code so i can do it alone with new cases.
Is there a way to do this for any given Dataframe, without needing to type in the specific values of the columns? (I'd be fine typing in the column names, just not the values for each) I have a set of massive dataframes and I would like to not have to type in hundreds of thousands of values manually.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.