0

I want to fetch dynamic content of webpages. I have tried a lot of modules in python as such mechanize, urllib, BS4 and has also used simple_html_dom module in PHP but none of them help me to correctly fetch content of a dynamic page.

I have tried this code:

import urllib2
url = '<url>'
req = urllib2.Request(url)
f = urllib2.urlopen(req)
a = open("E://<url>.html","a")
for x in f:
    a.write(str(x))
f.close()
print "succesful fetching"

and then opened in browser without being connected to internet , it didn’t have content which you will get when you are connected to internet. My need is to crawl such dynamic pages and it won't be possible until you have stored the whole actual HTML (that will spawn when URL is opened in some browser) in some variable . This modules is fetching static content.

2
  • 1
    Could you please post an example of the code that you tried and what exactly it is you are trying to achieve? Commented May 20, 2015 at 13:48
  • I can (unsurprisingly) get that webpage with a 3 line python "requests" script Commented May 20, 2015 at 15:10

1 Answer 1

1

On modern websites using JavaScript this simplistic approach doesn't work. You will either have to load all the JavaScript and execute the JavaScript on your loaded HTML, or, the more simple solution, use some library that launches a real browser like selenium.

That way the browser loads the page, and executes all of the dynamic code. The only problem remains is to see if it has stopped loading (as JavaScript cannot indicate it is finished). I normally look at some element I know to be dynamically loaded and retry to see if it is there with increasing intervals until I time out.

Once you decide enough dynamic content is there you can start parsing the HTML with selenium's built in DOM search routines.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.