0

I can't get one javascript table with BueatifulSoup, returning empty array

I tried to get data from this page. https://www.hkex.com.hk/Mutual-Market/Stock-Connect/Statistics/Historical-Daily?sc_lang=en#select4=1&select5=2&select3=0&select2=3&select1=24

import requests, json
text = requests.get("https://www.hkex.com.hk/Mutual-Market/Stock-Connect/Statistics/Historical-Daily?sc_lang=en#select4=0&select5=2&select3=0&select2=3&select1=24")
data = json.loads(text)

print(data['Scty'])
5
  • Do you mean bueatifulsoup? Commented Apr 26, 2019 at 3:44
  • the webpage you are downloading contains html not json, so Python throws a error. Also where is your code using Bueatiful Soup? Commented Apr 26, 2019 at 3:46
  • I started with BeutifulSoap, but didnt work. So I tried with Json Commented Apr 26, 2019 at 3:48
  • stackoverflow.com/questions/41054232/… Commented Apr 26, 2019 at 3:49
  • which table on this page? Commented Apr 26, 2019 at 4:14

1 Answer 1

1

There is another url you can use - found by looking at the network tab. A little string manipulation on the response text and you have a string that can be loaded with json and contains everything on the page (including for all 4 drop down geographies). There is no need for bs4. You can extract everything you want with json library.

Explore it here.

import requests
import json

r = requests.get('https://www.hkex.com.hk/eng/csm/DailyStat/data_tab_daily_20190425e.js?_=1556252093686')
data = json.loads(r.text.replace('tabData = ',''))

For example, path to first row of table on landing page:

enter image description here

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.