0

So I was trying to scrape a website. When I scrape it, it turns out that the result isn't the same as when you try to right click and view page source on Mozilla or Google Chrome.

The code I used:

import urllib

page = urllib.urlopen("http://www.google.com/search?q=python") 
#or any other website that uses search
python = page.read()
print python

It turns out that the code only takes the 'raw' web page, which isn't what I wanted. For websites like this, I want the code after javascript etc. has run. So that the result is the same as if you were right clicking and viewing source code from your browser.

Is there any other way of doing this?

1
  • Either look at browser automation using something like Selenium or "headless browsing" Commented Dec 17, 2013 at 14:31

1 Answer 1

1

its not exactly a raw page as it is an error page from google to you : in the print python part it says on the top of the message :

Your client does not have permission to get URL /search?q=python from this server.

if you were to change your page variable to

page = urllib.urlopen("http://volt.al/")

you'd see the javascript.

try it out with different pages to see what you like

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.