How to convert json to Pandas Dataframe with nested objects?

Question

I am extracting some tweets and I am getting json (json_response) in return which looks something like this (I've added dummy IDs):

{
    "data": [
        {
            "author_id": "123456",
            "conversation_id": "7890",
            "created_at": "2020-03-01T23:59:58.000Z",
            "id": "12345678",
            "lang": "en",
            "public_metrics": {
                "like_count": 1,
                "quote_count": 2,
                "reply_count": 3,
                "retweet_count": 4
            },
            "referenced_tweets": [
                {
                    "id": "13664100",
                    "type": "retweeted"
                }
            ],
            "reply_settings": "everyone",
            "source": "Twitter for Android",
            "text": "This is a sample."
        }
],
"includes": {
        "users": [
            {
                "created_at": "2018-08-29T23:45:37.000Z",
                "description": "",
                "id": "7890123",
                "name": "Twitter user",
                "public_metrics": {
                    "followers_count": 1199,
                    "following_count": 1351,
                    "listed_count": 0,
                    "tweet_count": 52607
                },
                "username": "user_123",
                "verified": false
            }
]
}

I am trying to convert it into pandas dataframe using the following code:

import json
from pandas.io.json import json_normalize

df = pd.DataFrame.from_dict(pd.json_normalize(json_response['data']), orient='columns')

And it is giving me the output whose header is as follows:

conversation_id | text | source | reply_settings | referenced_tweets | id | created_at | lang | author_id | public_metrics.retweet_count | public_metrics.reply_count | public_metrics.like_count | public_metrics.quote_count | in_reply_to_user_id

except that I want to add username as a column in the df along with other columns. I'd like to add the column username among these columns and I don't know how to do that. Any guidance please?

Tranbi · Accepted Answer · 2021-12-06 09:38:28Z

2

IIUC you have a list of users dictionaries in json_response['data'] and json_response['include']['users']. Why not create your own dictionary list from those two?

json_response = json.loads(response_raw)
your_dict_list = json_response['data']
for i, user in enumerate(json_response['includes']['users']):
    your_dict_list[i]['username'] = user['username']

df = pd.json_normalize(your_dict_list)

Output:

  author_id conversation_id                created_at        id lang  ...  username public_metrics.like_count public_metrics.quote_count public_metrics.reply_count public_metrics.retweet_count
0    123456            7890  2020-03-01T23:59:58.000Z  12345678   en  ...  user_123                         1                          2                          3                            4

answered Dec 6, 2021 at 9:38

Tranbi

12.8k6 gold badges19 silver badges39 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

AneesBaqir Over a year ago

It is giving me the this error:

IndexError                                Traceback (most recent call last) <ipython-input-40-7b211ebb88ef> in <module>()       1 your_dict_list = json_response['data']       2 for i, user in enumerate(json_response['includes']['users']): ----> 3     your_dict_list[i]['username'] = user['username']       4 df = pd.json_normalize(your_dict_list)  IndexError: list index out of range

Tranbi Over a year ago

Are json_response['data'] and json_response['include']['users'] the same length? How do you actually load your data? The json in your question has a missing bracket and it's not a python dictionary either (false instead of False) I assumed you would read your json as string (response_raw in my example) and load it with json.loads

AneesBaqir Over a year ago

Thank you for your response. Its strange that the length of json_response['data'] is 10 and 11 for json_response['include']['users']. However, I am getting the data using these commands url = create_url(keyword, start_time,end_time, max_results) json_response = connect_to_endpoint(url[0], headers, url[1])``` and then using json_response further.

Collectives™ on Stack Overflow

How to convert json to Pandas Dataframe with nested objects?

1 Answer 1

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related