2

I'm trying to download what is suppose to be a json file hosted in some github repo. Here's the link.

The problem is that when i try to decode the json with python i'm having the following error:

JSONDecodeError: Expecting value: line 1 column 1 (char 0)

This sounds like an incorrect json formatting, so i when manually open the file in an editor, this is what i see:

enter image description here

This is not a json file, but that is what is supposed to be. Instead, i'm getting this tree-structured file. I need to load this into a dataframe using pandas. Could somebody please point me in the right direction here? what am i doing wrong?

This is the code i have used to get that file:

import urllib.request as r
from bs4 import BeautifulSoup as bs
import json

url = r.urlopen("https://raw.githubusercontent.com/aavail/ai-workflow-capstone/master/cs-train/invoices-2017-11.json")
content = url.read()
soup = bs(content)
newDictionary=json.loads(str(soup))

Thank you very much in advance

2
  • 1
    Congrats! 1000 rep. Commented May 24, 2020 at 19:01
  • Thank you @AnnZen !! :D Commented May 24, 2020 at 19:06

1 Answer 1

3

Instead, i'm getting this tree-structured file.

Nope, I promise you're getting a JSON file ;). The tree-structured represtation is your browser making the file look pretty for you. If you curl -XGET -L <url>, you'll see what is very much a JSON string.

Pandas allows you to read JSON from a URL directly:

>>> import pandas as pd
>>> url = "https://raw.githubusercontent.com/aavail/ai-workflow-capstone/master/cs-train/invoices-2017-11.json"
>>> df = pd.read_json(url)
>>> df.head()
          country  customer_id invoice  price stream_id  times_viewed  year  month  day
0  United Kingdom      13085.0  489434   6.95     85048            12  2017     11   28
1  United Kingdom          NaN  489597   8.65     22130             1  2017     11   28
2  United Kingdom          NaN  489597   1.70     22132             6  2017     11   28
3  United Kingdom          NaN  489597   1.70     22133             4  2017     11   28
4  United Kingdom          NaN  489597   0.87     22134             1  2017     11   28
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.