1

I'm having a problem with some json file (generated from Twython / Tweeter API).

The file looks like this:

[
{
    "created_at": "Thu Mar 14 20:24:53 +0000 2019",
    "id": 1106290123426140165,
    "id_str": "1106290123426140165",
    "text": "RT @ALABDULLATIF: n@B_Al3bdullatif \n\u278b\u2026",
    "source": "<a href=\"http://twitter.com/download/android\" 
     rel=\"nofollow\">Twitter for Android</a>",
    "truncated": false,
    "in_reply_to_status_id": null,
    "in_reply_to_status_id_str": null,
    "in_reply_to_user_id": null,
    "in_reply_to_user_id_str": null,
    "in_reply_to_screen_name": null,
    "user": {
        "id": 1091414851400929286,
        "id_str": "1091414851400929286",
        "name": "u064a",
        "screen_name": "UThbZ4nwsuzAMQm",
        "location": null,
        "url": null,
        "description": null,
        "translator_type": "none",
        "protected": false,
        "verified": false,
        "followers_count": 0,
        "friends_count": 0,
        "listed_count": 0,
        "favourites_count": 0,
        "statuses_count": 2,
        "created_at": "Fri Feb 01 19:15:52 +0000 2019",
        "utc_offset": null,
        "time_zone": null,
        "geo_enabled": false,
        "lang": "en",
        "contributors_enabled": false,
        "is_translator": false,
        "profile_background_color": "F5F8FA",
        ETC

When I try to read it with this:

fname = "tweets_03.json" 

text=[]
retweets=[]
language=[]
followers=[]

with open(fname, 'r') as f:
    for line in f:
        if not line.isspace():
            tweet = json.loads(line)
            text.append(tweet.get('text', ''))
            retweets.append(tweet.get('retweet_count',''))
            language.append(tweet.get('lang',''))
            followers.append(tweet.get('followers_count',''))

text=pd.DataFrame(text)
text.columns=['text']
retweets=pd.DataFrame(retweets)
retweets.columns=['retweets']
language=pd.DataFrame(language)
language.columns=['language']
followers=pd.DataFrame(followers)
followers.columns=['followers']

df=pd.concat([text,retweets,language,followers],axis=1)
df.head(5)

I get the following error msg:

JSONDecodeError: Expecting value: line 2 column 1 (char 2)

I also tried:

data = "tweets_03.json" 
jdata = json.loads(data)
df = pd.DataFrame(jdata)

and that gives me the following error:

JSONDecodeError: Expecting value: line 1 column 1 (char 0)

If anyone could pse help it would be much appreciated. I'm wanting to convert the data into a dataframe. Thank you Best Wishes

6
  • Duplicate for stackoverflow.com/questions/21104592/json-to-pandas-dataframe Commented Mar 15, 2019 at 12:20
  • Possible duplicate of JSON to pandas DataFrame Commented Mar 15, 2019 at 12:25
  • when i try the above (# reading the JSON data using json.load() file = 'data.json' with open(file) as train_file: dict_train = json.load(train_file) # converting json dataset from dictionary to dataframe train = pd.DataFrame.from_dict(dict_train, orient='index') train.reset_index(level=0, inplace=True) , I now get the following error msg: JSONDecodeError: Extra data: line 1938 column 2 (char 79714) Commented Mar 15, 2019 at 12:29
  • ... and when i check the content of line 1938, I see this: ][ Commented Mar 15, 2019 at 12:31
  • @tezzaaa I have updated the answer, please have a look Commented Mar 15, 2019 at 12:42

2 Answers 2

2

The issue is your json file is actually multiple json files into 1. You need to separate them and load read them in.

The way I did that was find all those ][ instances and split on those. Then just iterate through each of those to be loaded and then dumped into a dataframe. It is quite messy though as it's nested dictionaries and lists within there. But this will generate a dataframe for you.

import pandas as pd
import json

data = []
with open('tweets_03.json') as json_file:  
    data_str = json_file.read()
    data_str = data_str.split('[',1)[-1]
    data_str = data_str.rsplit(']',1)[0]
    data_str = data_str.split('][')

for jsonStr in data_str:
    jsonStr = '[' + jsonStr + ']'

    temp_data = json.loads(jsonStr)
    for each in temp_data:
        data.append(each)

df = pd.DataFrame(data)

Output:

print (df)
    contributors                        ...                                                                       user
0           None                        ...                          {'id': 427643942, 'id_str': '427643942', 'name...
1           None                        ...                          {'id': 1063556070151528449, 'id_str': '1063556...
2           None                        ...                          {'id': 924769730606567424, 'id_str': '92476973...
3           None                        ...                          {'id': 287355962, 'id_str': '287355962', 'name...
4           None                        ...                          {'id': 2908153155, 'id_str': '2908153155', 'na...
5           None                        ...                          {'id': 1040181804026744832, 'id_str': '1040181...
6           None                        ...                          {'id': 397901665, 'id_str': '397901665', 'name...
7           None                        ...                          {'id': 14547327, 'id_str': '14547327', 'name':...
8           None                        ...                          {'id': 1159572698, 'id_str': '1159572698', 'na...
9           None                        ...                          {'id': 3025332991, 'id_str': '3025332991', 'na...
10          None                        ...                          {'id': 926921371065647104, 'id_str': '92692137...
11          None                        ...                          {'id': 428415680, 'id_str': '428415680', 'name...
12          None                        ...                          {'id': 1040967562442551301, 'id_str': '1040967...
13          None                        ...                          {'id': 984957304905744385, 'id_str': '98495730...
14          None                        ...                          {'id': 24174895, 'id_str': '24174895', 'name':...
15          None                        ...                          {'id': 543254812, 'id_str': '543254812', 'name...
16          None                        ...                          {'id': 377146136, 'id_str': '377146136', 'name...
17          None                        ...                          {'id': 63308004, 'id_str': '63308004', 'name':...
18          None                        ...                          {'id': 3039612566, 'id_str': '3039612566', 'na...
19          None                        ...                          {'id': 2902946418, 'id_str': '2902946418', 'na...
20          None                        ...                          {'id': 966776807830716416, 'id_str': '96677680...
21          None                        ...                          {'id': 1017086923507040256, 'id_str': '1017086...
22          None                        ...                          {'id': 888271500658081792, 'id_str': '88827150...
23          None                        ...                          {'id': 1085986810591932419, 'id_str': '1085986...
24          None                        ...                          {'id': 720061374999568384, 'id_str': '72006137...
25          None                        ...                          {'id': 21243436, 'id_str': '21243436', 'name':...
26          None                        ...                          {'id': 2849771796, 'id_str': '2849771796', 'na...
27          None                        ...                          {'id': 790823048744165376, 'id_str': '79082304...
28          None                        ...                          {'id': 881673927927496704, 'id_str': '88167392...
29          None                        ...                          {'id': 4344166641, 'id_str': '4344166641', 'na...
..           ...                        ...                                                                        ...
942         None                        ...                          {'id': 306237570, 'id_str': '306237570', 'name...
943         None                        ...                          {'id': 883298986739748864, 'id_str': '88329898...
944         None                        ...                          {'id': 3027274443, 'id_str': '3027274443', 'na...
945         None                        ...                          {'id': 3189578162, 'id_str': '3189578162', 'na...
946         None                        ...                          {'id': 2327121601, 'id_str': '2327121601', 'na...
947         None                        ...                          {'id': 990411876, 'id_str': '990411876', 'name...
948         None                        ...                          {'id': 2995641808, 'id_str': '2995641808', 'na...
949         None                        ...                          {'id': 44540580, 'id_str': '44540580', 'name':...
950         None                        ...                          {'id': 47636922, 'id_str': '47636922', 'name':...
951         None                        ...                          {'id': 996052119433048064, 'id_str': '99605211...
952         None                        ...                          {'id': 806255305474641920, 'id_str': '80625530...
953         None                        ...                          {'id': 66738256, 'id_str': '66738256', 'name':...
954         None                        ...                          {'id': 1068149370229542912, 'id_str': '1068149...
955         None                        ...                          {'id': 229965328, 'id_str': '229965328', 'name...
956         None                        ...                          {'id': 1039247810410016769, 'id_str': '1039247...
957         None                        ...                          {'id': 4886141236, 'id_str': '4886141236', 'na...
958         None                        ...                          {'id': 892138074, 'id_str': '892138074', 'name...
959         None                        ...                          {'id': 134945640, 'id_str': '134945640', 'name...
960         None                        ...                          {'id': 300694818, 'id_str': '300694818', 'name...
961         None                        ...                          {'id': 840240258, 'id_str': '840240258', 'name...
962         None                        ...                          {'id': 265481826, 'id_str': '265481826', 'name...
963         None                        ...                          {'id': 1082113676344098816, 'id_str': '1082113...
964         None                        ...                          {'id': 229965328, 'id_str': '229965328', 'name...
965         None                        ...                          {'id': 4634960663, 'id_str': '4634960663', 'na...
966         None                        ...                          {'id': 161350829, 'id_str': '161350829', 'name...
967         None                        ...                          {'id': 1003363328641716225, 'id_str': '1003363...
968         None                        ...                          {'id': 898601924630597636, 'id_str': '89860192...
969         None                        ...                          {'id': 3285036854, 'id_str': '3285036854', 'na...
970         None                        ...                          {'id': 1099846021952294912, 'id_str': '1099846...
971         None                        ...                          {'id': 34326169, 'id_str': '34326169', 'name':...

[972 rows x 36 columns]
Sign up to request clarification or add additional context in comments.

2 Comments

Hi, I get the following error: ValueError: Trailing data
THANK YOU CHITOWN88 :-) YOU SAVED MY DAY
0
import pandas as pd

fileName = 'tweets_03.json'
jsonData = pd.read_json(fileName,lines=True)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.