How do I scrape website HTML after Javascript has run?

Question

So I was trying to scrape a website. When I scrape it, it turns out that the result isn't the same as when you try to right click and view page source on Mozilla or Google Chrome.

The code I used:

import urllib

page = urllib.urlopen("http://www.google.com/search?q=python") 
#or any other website that uses search
python = page.read()
print python

It turns out that the code only takes the 'raw' web page, which isn't what I wanted. For websites like this, I want the code after javascript etc. has run. So that the result is the same as if you were right clicking and viewing source code from your browser.

Is there any other way of doing this?

Either look at browser automation using something like Selenium or "headless browsing" — Jon Clements
– Jon Clements, Commented Dec 17, 2013 at 14:31

Salyangoz · Accepted Answer · 2013-12-17 13:04:23Z

1

its not exactly a raw page as it is an error page from google to you : in the print python part it says on the top of the message :

Your client does not have permission to get URL /search?q=python from this server.

if you were to change your page variable to

page = urllib.urlopen("http://volt.al/")

you'd see the javascript.

try it out with different pages to see what you like

answered Dec 17, 2013 at 13:04

Salyangoz

2872 silver badges14 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

How do I scrape website HTML after Javascript has run?

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related