7

I've an html table like this:

<TABLE>
<TR>
    <TD><P>Name</P></TD>
    <TD><P>Fees</P></TD>
    <TD><P>Awards</P></TD>
    <TD><P>Total</P></TD>
</TR>
<TR>
    <TD><P>Tony</P></TD>
    <TD >7,800</TD>
    <TD >7</TD>
    <TD>15,400</TD>
</TR>
<TR>
    <TD><P>Paul</FONT></P></TD>
    <TD >7,800</TD>
    <TD >7</TD>
    <TD>15,400</TD>
</TR>
<TR>
    <TD><P>Richard</P></TD>
    <TD >7,800</TD>
    <TD >7</TD>
    <TD>15,400</TD>
</TR>

</TR>
</TABLE>

I want to extract the values of table. I'd tried the following.

import lxml.html
html = lxml.html.parse(''html_table)
text_value = html.xpath('//tr/td/text()')
packages = html.xpath('//tr/td/p')
p_content = [p.text_content() for p in packages]

is there any way to extract both the <p> text and the text of <td> to a single list ?

2

1 Answer 1

9

You could do something like

>>> doc = """<TABLE>
... <TR>
...     <TD><P>Name</P></TD>
...     <TD><P>Fees</P></TD>
...     <TD><P>Awards</P></TD>
...     <TD><P>Total</P></TD>
... </TR>
... <TR>
...     <TD><P>Tony</P></TD>
...     <TD >7,800</TD>
...     <TD >7</TD>
...     <TD>15,400</TD>
... </TR>
... <TR>
...     <TD><P>Paul</FONT></P></TD>
...     <TD >7,800</TD>
...     <TD >7</TD>
...     <TD>15,400</TD>
... </TR>
... <TR>
...     <TD><P>Richard</P></TD>
...     <TD >7,800</TD>
...     <TD >7</TD>
...     <TD>15,400</TD>
... </TR>
... 
... </TR>
... </TABLE>"""
>>> import lxml.html
>>> root = lxml.html.fromstring(doc)
>>> root.xpath('//tr/td//text()')
['Name', 'Fees', 'Awards', 'Total', 'Tony', '7,800', '7', '15,400', 'Paul', '7,800', '7', '15,400', 'Richard', '7,800', '7', '15,400']
>>> 

If you have 2 tables in document, you can first loop on tables and then use a relative XPath expression (with a leading .) for descendant text nodes on each table

>>> doc = """<TABLE>
... <TR>
...     <TD><P>Name</P></TD>
...     <TD><P>Fees</P></TD>
...     <TD><P>Awards</P></TD>
...     <TD><P>Total</P></TD>
... </TR>
... <TR>
...     <TD><P>Tony</P></TD>
...     <TD >7,800</TD>
...     <TD >7</TD>
...     <TD>15,400</TD>
... </TR>
... <TR>
...     <TD><P>Paul</FONT></P></TD>
...     <TD >7,800</TD>
...     <TD >7</TD>
...     <TD>15,400</TD>
... </TR>
... <TR>
...     <TD><P>Richard</P></TD>
...     <TD >7,800</TD>
...     <TD >7</TD>
...     <TD>15,400</TD>
... </TR>
... 
... </TR>
... </TABLE>
... <TABLE>
... <TR>
...     <TD><P>Name</P></TD>
...     <TD><P>Fees</P></TD>
...     <TD><P>Awards</P></TD>
...     <TD><P>Total</P></TD>
... </TR>
... <TR>
...     <TD><P>Tony</P></TD>
...     <TD >7,800</TD>
...     <TD >7</TD>
...     <TD>15,400</TD>
... </TR>
... <TR>
...     <TD><P>Paul</FONT></P></TD>
...     <TD >7,800</TD>
...     <TD >7</TD>
...     <TD>15,400</TD>
... </TR>
... <TR>
...     <TD><P>Richard</P></TD>
...     <TD >7,800</TD>
...     <TD >7</TD>
...     <TD>15,400</TD>
... </TR>
... 
... </TR>
... </TABLE>"""
>>> import lxml.html
>>> root = lxml.html.fromstring(doc)
>>> root.xpath('//tr/td//text()')
['Name', 'Fees', 'Awards', 'Total', 'Tony', '7,800', '7', '15,400', 'Paul', '7,800', '7', '15,400', 'Richard', '7,800', '7', '15,400', 'Name', 'Fees', 'Awards', 'Total', 'Tony', '7,800', '7', '15,400', 'Paul', '7,800', '7', '15,400', 'Richard', '7,800', '7', '15,400']
>>> for tbl in root.xpath('//table'):
...     elements = tbl.xpath('.//tr/td//text()')
...     print elements
... 
['Name', 'Fees', 'Awards', 'Total', 'Tony', '7,800', '7', '15,400', 'Paul', '7,800', '7', '15,400', 'Richard', '7,800', '7', '15,400']
['Name', 'Fees', 'Awards', 'Total', 'Tony', '7,800', '7', '15,400', 'Paul', '7,800', '7', '15,400', 'Richard', '7,800', '7', '15,400']
>>> 
Sign up to request clarification or add additional context in comments.

2 Comments

Thanks @paul t. for your answer. It worked fine. I've another doubt, which i'd asked as a comment in question. "Is it possible to parse multiple tables in a html to different list using lxml? ", Could you help me with this?
@kishorekdty , I just added an example for loop on tables

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.