1

I'm a beginner at Python and I want to read info from a site and get some of the data as output in my textbox (I use EasyGUI). I have found this to get the HTML source of a URL but now I want to work with the HTML output, I know how to work with XML and I guess it's a bit the same for HTML. Is there any way to work with the elements and attributes?

filehandle = urllib.urlopen('URL')

for lines in filehandle.readlines():
    print lines

filehandle.close()

thanks in advance

1

2 Answers 2

3

As suggested, Beautiful soup is a library that can help you. http://www.crummy.com/software/BeautifulSoup/bs3/download/2.x/documentation.html, shows a straightforward example.

from BeautifulSoup import BeautifulSoup
soup = BeautifulSoup(filehandle.read())
titleTag = soup.html.head.title

Python has a built in parser too. http://docs.python.org/library/htmlparser.html

BeautifulSoup is very good at handling broken html though.

Sign up to request clarification or add additional context in comments.

Comments

0

If you're familiar with jQuery's syntax to select HTML elements, you may find pyquery useful.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.