how to print already interpreted html data in python

Question

I have a HTML file with the following data structure:

<tr>
    <td valign="top"><img src="img.jpg"></td>
    <td><a href="file.zip">file.zip</a></td>
    <td align="right">24-Apr-2013 12:42 </td>
    <td align="right">200K</td>
</tr>
...

It's basically a simple table and when viewed in Firefox it looks like this:

file.zip   22-Apr-2013 12:42   200K

I want to extract this three values (file name, date, size) and I could do it e.g. with split() but I am wondering if it is possible to print "the html interpreted form" of this in python?

import xyz
print xyz.htmlinterpreted(htmlfile.html)
>>> file.zip   22-Apr-2013 12:42   200K

That way I could easiely split the data with split(" "). Is this possible in python?

Martijn Pieters · Accepted Answer · 2013-04-24 18:26:20Z

1

Use a HTML parser. BeautifulSoup makes this a breaze:

from bs4 import BeautifulSoup

soup = BeautifulSoup(html_source)
print list(soup.stripped_strings)

Demo:

>>> from bs4 import BeautifulSoup                                                                                                   >>> soup = BeautifulSoup('''<tr><td valign="top"><img src="img.jpg"></td><td><a href="file.zip">file.zip</a></td><td align="right">24-Apr-2013 12:42 </td><td align="right">200K</td></tr>''')
>>> print list(soup.stripped_strings)
[u'file.zip', u'24-Apr-2013 12:42', u'200K']

answered Apr 24, 2013 at 18:26

Martijn Pieters

1.1m326 gold badges4.2k silver badges3.4k bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

how to print already interpreted html data in python

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related