I'm trying to parse a web page that contains this:
<table style="width: 100%; border-top: 1px solid black; border-bottom: 1px solid black;">
<tr>
<td colspan="2"
style="border-top: 1px solid black; border-bottom: 1px solid black; background-color: #f0ffd3;">February 20, 2015</td>
</tr>
<tr>
<td style="border-top: 1px solid gray; font-weight: bold;">9:00 PM</td>
<td style="border-top: 1px solid gray; font-weight: bold">14°F</td>
</tr>
<tr>
<td style="border-bottom: 1px solid gray;">Clear<br />
Precip:
0 %<br />
Wind:
from the WSW at 6 mph
</td>
<td style="border-bottom: 1px solid gray;"><img class="wxicon" src="http://i.imwx.com/web/common/wxicons/31/31.gif"
style="border: 0px; padding: 0px 3px" /></td>
</tr>
<tr>
<td style="border-top: 1px solid gray; font-weight: bold;">10:00 PM</td>
<td style="border-top: 1px solid gray; font-weight: bold">13°F</td>
</tr>
<tr>
<td style="border-bottom: 1px solid gray;">Clear<br />
Precip:
0 %<br />
Wind:
from the WSW at 6 mph
</td>
<td style="border-bottom: 1px solid gray;"><img class="wxicon" src="http://i.imwx.com/web/common/wxicons/31/31.gif"
style="border: 0px; padding: 0px 3px" /></td>
</tr>
(it continues with more rows and ends with [/table]
tree = html.fromstring(page)
table = tree.xpath('//table/tr')
for item in table:
for elem in item.xpath('*'):
if 'colspan' in html.tostring(elem):
print '*', elem.text
elif elem.text is not None:
print elem.text,
else:
print
somewhat works. It does not get the text following the [br /] and it's far from elegant. How do I get the missing text? In addition, any suggestions for improving the code would be appreciated.