1

I have read that to render javascript to scrape the raw html, I will need to use selenium and a webdriver like phantomjs. However, doing so still does not render the javascripts for me. Below is a sample script.

Anyone?

from selenium import webdriver
import time

url="http://www.sgx.com/wps/portal/sgxweb/home/company_disclosure/stockfacts?page=2&code=5TG&lang=en-us"
PJ = r'/xxx/MyPythonScripts/phantomjs_mac'

driver = webdriver.PhantomJS(PJ)
driver.get(url)
time.sleep(3)
html=driver.page_source.encode('utf-8')
print html
2
  • What is the result of your script execution? Did you try to use Chrome or Firefox to visualize execution? Commented Feb 1, 2017 at 8:48
  • well, I just tried searching for some text like 'Total Revenue' but none. All of them are just in Javascript which I don't really understand. I am using PhantomJS, not Chrome or Firefox webdrivers. That being said, I also tried Chrome driver and the result is exactly the same Commented Feb 1, 2017 at 9:10

1 Answer 1

1

Page content, as you've mentioned, is generated by JavaScript code, so you won't be able to find it in initial page source and even adding time.sleep(3) could be not enough... You need to wait some time until required data present on page. Try to use below code:

from selenium import webdriver as web 
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

url="http://www.sgx.com/wps/portal/sgxweb/home/company_disclosure/stockfacts?page=2&code=5TG&lang=en-us"
PJ = r'/xxx/MyPythonScripts/phantomjs_mac'

driver = webdriver.PhantomJS(PJ)
driver.get(url)

WebDriverWait(driver, 10).until(EC.frame_to_be_available_and_switch_to_it((By.XPATH,'//div[starts-with(@id, "mainns_")]/iframe')))
WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, '//div[@class="data-point-container section-break"]/table')))

html = driver.page_source
assert "Total Revenue" in html

With this code you will wait up to 10 seconds (you can increase timeout if you need) until required table element presence. If it not rendered within 10 seconds, you'll get TimeOutException

Sign up to request clarification or add additional context in comments.

4 Comments

Hi Andersson, thanks~ I tried your method but it just timeout no matter what time I gave. The code seems to be the same as time.sleep just that if it can't detect the element in question, it will timeout. However, the Javascript still did not load. Curious if you had a successful result using that script?
I didn't notice iframe.. Now it should work. Check updated code
It does indeed. Thanks! May I ask why time.sleep doesnt work? Even if I put it to 60sec~
Target table located inside an iframe. This is kind of separated HTML document. To be able to handle it you should switch to this document first. Otherwise all its elements would be unavailable even if you see them and can find them via HTML-inspector tools

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.