1

I cannot decode my request's reponse into a useful format. Here's the code:

import json
import requests

url = "https://www.tsx.com/json/company-directory/search/tsx/%5EC?callback=jQuery17109078120971266259_1565471114746&_=1565481666704"

r = requests.get(url)

I tried tinkering with these variations from ideas on the forums, but I get the following error message:

File "C:\Users\XXXXX\Anaconda3\lib\json\decoder.py", line 357, in raw_decode raise JSONDecodeError("Expecting value", s, err.value) from None JSONDecodeError: Expecting value

r = requests.get(url).json()

Or

json_data = json.loads(r.text)

with no success... To show what I am trying to decode: here are the first lines output of r.text:

In [71]: runfile('C:/python...)
b'jQuery17109078120971266259_1565471114739({"last_updated":1565340914,"length":158,"results":[{"symbol":"AUMB","name":"1911 Gold Corporation","instruments":[{"symbol":"AUMB","name":"1911 Gold Corporation"}]},{"symbol":"ALBS.P","name":"A-Labs Capital I Corp.","instruments":[{"symbol":"ALBS.P","name":"A-Labs Capital I Corp."}]},{"symbol":"ALAB.P","name":"A-Labs Capital II Corp.","instruments":[{"symbol":"ALAB.P","name":"A-Labs Capital II Corp."}]},...

Side note: I found the "AJAX" URL under "network" in the Google Chrome developer tools. These efforts are the result of me not being able to scrape a website using traditional BS4 and learning about requests ability to scrape AJAX content, if I can use the term AJAX content.

1 Answer 1

3

The reason for the error is because the request text is not actually a valid json format. It has some extra text which are the jquery call from the frontend.

With a little bit of regex you are able to actually extract the json from the results you just presented, and use as a json object in python:

  • (?:jQuery[0-9_]*\()(.+)(?:\);), which in essence means:

  • (?:jQuery[0-9_]*\() Starts with jQuery, has a big number with possible underscores (non-capturing);

  • (.+) Anything in between
  • (?:\);) Ends with a bracket and semi-colon (non-capturing)

Here the results:

import json
import re

r = requests.get(url)
results = json.loads(re.findall('(?:jQuery[0-9_]*\()(.+)(?:\);)', r.text)[0])

results
Out[1]:
{'last_updated': 1565340925,
 'length': 197,
 'results': [{'symbol': 'CLIQ',
   'name': 'Alcanna Inc.',
   'instruments': [{'symbol': 'CLIQ', 'name': 'Alcanna Inc.'},
    {'symbol': 'CLIQ.DB', 'name': 'Alcanna Inc 31JA22Db'}]},
...
Sign up to request clarification or add additional context in comments.

1 Comment

It works, no editing needed. Great response, huge thanks!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.