How can I fetch the page source of a webpage using Python?

Question

I wish to fetch the source of a webpage and parse individual tags myself. How can I do this in Python?

jgritty · Accepted Answer · 2011-11-05 06:37:47Z

3

import urllib2
urllib2.urlopen('http://stackoverflow.com').read()

That's the simple answer, but you should really look at BeautifulSoup

http://www.crummy.com/software/BeautifulSoup/

answered Nov 5, 2011 at 6:37

jgritty

12k3 gold badges41 silver badges62 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

David Alber · Accepted Answer · 2011-11-05 06:43:49Z

2

Some options are:

All except httplib2 and Beautiful Soup are in the Python Standard Library. The pages for each of the packages above contain simple examples that will let you see what suits your needs best.

answered Nov 5, 2011 at 6:43

David Alber

18.1k6 gold badges68 silver badges72 bronze badges

Comments

Srikar Appalaraju · Accepted Answer · 2011-11-05 06:34:46Z

1

I would suggest you use BeautifulSoup

#for HTML parsing
from BeautifulSoup import BeautifulSoup
import urllib2

doc = urllib2.urlopen('http://google.com').read()

soup = BeautifulSoup(''.join(doc))

soup.contents[0].name

After this you can pretty much parse anything out of this document. See documentation which has detailed examples of how to do it.

answered Nov 5, 2011 at 6:34

Srikar Appalaraju

74k55 gold badges221 silver badges265 bronze badges

Comments

Jonathan Livni · Accepted Answer · 2011-11-05 14:56:00Z

1

All the answers here are true, and BeautifulSoup is great, however when the source HTML is dynamically created by javascript, and that's usually the case these days, you'll need to use some engine that first creates the final HTML and only then fetch it, or else you'll have most of the content missing.

As far as I know, the easiest way is simply using the browser's engine for this. In my experience, Python+Selenium+Firefox is the least resistant path

edited Nov 5, 2011 at 14:56

answered Nov 5, 2011 at 6:50

Jonathan Livni

108k112 gold badges278 silver badges367 bronze badges

Collectives™ on Stack Overflow

How can I fetch the page source of a webpage using Python?

4 Answers 4

Comments

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Comments

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related