Python: Read HTML source from URL and get date into program

Question

I'm a beginner at Python and I want to read info from a site and get some of the data as output in my textbox (I use EasyGUI). I have found this to get the HTML source of a URL but now I want to work with the HTML output, I know how to work with XML and I guess it's a bit the same for HTML. Is there any way to work with the elements and attributes?

filehandle = urllib.urlopen('URL')

for lines in filehandle.readlines():
    print lines

filehandle.close()

thanks in advance

If you know how to work with xml. It's basicly the same. Parse the DOM. Check out BeautifulSoup or docs.python.org/library/htmlparser.html. — Niclas Nilsson
– Niclas Nilsson, Commented Mar 18, 2012 at 13:05

dm03514 · Accepted Answer · 2012-03-18 13:21:11Z

3

As suggested, Beautiful soup is a library that can help you. http://www.crummy.com/software/BeautifulSoup/bs3/download/2.x/documentation.html, shows a straightforward example.

from BeautifulSoup import BeautifulSoup
soup = BeautifulSoup(filehandle.read())
titleTag = soup.html.head.title

Python has a built in parser too. http://docs.python.org/library/htmlparser.html

BeautifulSoup is very good at handling broken html though.

answered Mar 18, 2012 at 13:21

dm03514

56.1k18 gold badges117 silver badges147 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Facundo Olano · Accepted Answer · 2012-03-18 13:37:55Z

0

If you're familiar with jQuery's syntax to select HTML elements, you may find pyquery useful.

answered Mar 18, 2012 at 13:37

Facundo Olano

2,6193 gold badges31 silver badges34 bronze badges

Collectives™ on Stack Overflow

Python: Read HTML source from URL and get date into program

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related