5

I'm newbie to python. Here is my code working on python 2.7.5

import urllib2
import sys       

url ="mydomain.com"
usock = urllib2.urlopen(url)
data = usock.read()
usock.close()

print data

Getting HTML markup like that and it works.

What I want to do is, to get value from inside <font class="big"></font> tag. for ex. I need data value from this example:

<font class="big">Data</font>

How to do it?

1
  • font? Wow, that's really old and evil HTML. Commented Sep 6, 2013 at 11:56

2 Answers 2

9

You can use a HTML parser module such as BeautifulSoup:

from bs4 import BeautifulSoup as BS
url ="mydomain.com"
usock = urllib2.urlopen(url)
data = usock.read()
usock.close()
soup = BS(data)
print soup.find('font', {'class':'big'}).text

This finds a tag <font> with a class="big". It then prints its content.

Sign up to request clarification or add additional context in comments.

2 Comments

ImportError: No module named bs4
@heron It's not in the standard library. Check the link I provided to find a download
1

Using lxml:

import urllib2
import lxml.html

url ="mydomain.com"

usock = urllib2.urlopen(url)
data = usock.read()
usock.close()
for font in lxml.html.fromstring(data).cssselect('font.big'):
    print font.text

>>> import lxml.html
>>> root = lxml.html.fromstring('<font class="big">Data</font>')
>>> [font.text for font in root.cssselect('font.big')]
['Data']

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.