How to get the string out of this html segment using python

Question

I am using python's beautiful stone soup to extract data from this web page. I am using this code segment to get a <li> object:

    req = urllib2.Request(url)
    req.add_header('User-Agent', 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-GB; rv:1.9.0.3) Gecko/200809241\
7 Firefox/3.0.3')

    response=urllib2.urlopen(req)
    link=response.read()
    response.close()

    soup = BeautifulStoneSoup(link, convertEntities=BeautifulStoneSoup.XML_ENTITIES)
    p = soup.find('ul',{"class":"vod_ordering"})

    j = 0
    while j < len(p('li')):
        li= p('li')[j]
        j = j+1

And now I want to break down the <li> object into it's parts. I don't have a problem (that I know of) to get the icon, link and title but I can't get the description which is between </strong> and </img> and does not belong to any tag apart from <li>.

I tried to use contents but I get an error:

Error Contents: sequence item 1: expected string or Unicode, Tag found

When I try to do this:

print ''.join(li.contents)

How can I get that string?

dugres · Accepted Answer · 2011-08-30 11:50:04Z

1

I would try

print ''.join(map(str, li.contents))

answered Aug 30, 2011 at 11:50

dugres

13.2k8 gold badges48 silver badges52 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

How to get the string out of this html segment using python

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related