0

I have a dictionary where the keys are GitHub repository names and the values contain JSON-formatted data.

ex:


    {'r1':[
       {'id': 1178421030,
       'name': 'x',
        },
       {'id': 1178420990,
       'name': 'y',
       }],
    'r2':[
       {'id': 1178421031,
       'name': 'a',
        },
       {'id': 1178420950,
       'name': 'b',
       }]
    }

I can create a dataframe from the JSON the values in the dict using:

df=pd.DataFrame()
for i in responses:
    
    df=df.append(pd.json_normalize(responses[i]))

This gives me a df that looks like this:

   id              name
 1178421030           x
 1178420990           y 
 1178421031           a
 1178420950           b

I want the keys of the dict as another column named repo_name in the df, something like:

   id              name       repo_name
 1178421030           x          r1
 1178420990           y          r1
 1178421031           a          r2   
 1178420950           b          r2

how shall I go about doing this ?

2 Answers 2

2

let's say your JSON is called "d"

   data=pd.DataFrame()
   for i in d.keys():
        z=pd.DataFrame(d[i])
        z['repo_name']=i
        data=pd.concat([data,z])



           id name repo_name
0  1178421030    x        r1
1  1178420990    y        r1
0  1178421031    a        r2
1  1178420950    b        r2
Sign up to request clarification or add additional context in comments.

Comments

1

I would suggest using collections.defaultdict; it should allow you more control over your data collection :

from collections import defaultdict

d = defaultdict(list)
for key, value in data.items():
    for entry in value:
        d["id"].append(entry["id"])
        d["name"].append(entry["name"])
        d["repo_name"].append(key)

d

defaultdict(list,
            {'id': [1178421030, 1178420990, 1178421031, 1178420950],
             'name': ['x', 'y', 'a', 'b'],
             'repo_name': ['r1', 'r1', 'r2', 'r2']})

Create dataframe:

pd.DataFrame(d)

      id      name  repo_name
0   1178421030  x   r1
1   1178420990  y   r1
2   1178421031  a   r2
3   1178420950  b   r2

Another option would be to use json_normalize in a list comprehension:

pd.concat(pd.json_normalize(data, record_path=[key]).assign(repo_name=key) 
          for key in data)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.