1

I've been collecting some tweets into a JSON file, with which I need to do some statistics with certain data in the JSON. After Googling several options of how to do this, none could give me the correct solution.

The JSON looks like this:

{"contributors": null, "truncated": false, "text": .... }

And applied this code to try and load it:

 import json
 f = open("user_timeline_Audi.jsonl",'r')
 data = f.read()
 print(data)
 bla = json.loads(data)

Basically the json.loads() gives me the next error:

json.decoder.JSONDecodeError: Extra data: line 2 column 1 (char 2698)

The end goals is that I need to get the followers_count and likes from several JSON files. Hope that someone can help!

EDIT:

Based on the answer from Alex Hall, my code now is:

import json

with open("user_timeline_BMW.jsonl",'r') as f:
    for line in f:
    obj = json.loads(line)
    bla = ["followers_count"]
    print(bla)

This just outputs a list, instead of the values behind it:

....
['followers_count']
['followers_count']
....

Hope someone has a suggestion for this step!

4
  • it's difficult to say without seeing the json file, but it looks like you are trying to load a file with multiple dictionaries? take a look at the answer of this threat; it might be what you are looking for. Commented Mar 11, 2018 at 10:27
  • Thanks for your reply, but I couldn't find the solution there. Commented Mar 11, 2018 at 10:43
  • check out @alex reply. might that be your issue? Commented Mar 11, 2018 at 11:03
  • It was for the json.loads() error! Now I need to figure out how to get the value from the lines. Commented Mar 11, 2018 at 11:09

2 Answers 2

4

You are dealing with JSON lines, where each line contains one JSON object. You should do:

for line in f:
    obj = json.loads(line)

and then do what you want with each object.

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks for your reply, it helped me a bit further, now stuck on how to get the values from the lines.
1

I think it is supposed to be bla = obj["followers_count"]

8 Comments

That gives me the error: KeyError: 'followers_count'
Can you print out all the keys in the dict by doing print(obj.keys()) and make sure the key you need is indeed present?
When i did print(obj) , the followers_count was present. But with your mentioned print(obj.keys()), it is NOT present
seems that followers_count might be a second level key? inside another one? It would be much more easier to give a proper answer if you posted the output of print(obj.keys()), otherwise is just guesswork.
Sorry, this is the output of 1 of the lines: dict_keys(['contributors', 'truncated', 'text', 'is_quote_status', 'in_reply_to_status_id', 'id', 'favorite_count', 'source', 'retweeted', 'coordinates', 'entities', 'in_reply_to_screen_name', 'in_reply_to_user_id', 'retweet_count', 'id_str', 'favorited', 'retweeted_status', 'user', 'geo', 'in_reply_to_user_id_str', 'possibly_sensitive', 'lang', 'created_at', 'in_reply_to_status_id_str', 'place', 'extended_entities'])
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.