3

The question may sound easy, but I am facing difficulty in solving it. I have a table like following:

<table><tbody>
<tr>
<td>2003</td>
<td><span class="positive">1.19</span> </td>
<td><span class="negative">-0.48</span> </td>
</tr>

My code is following:

 from lxml import etree

 for elem in tree.xpath('//*[@id="printcontent"]/div[8]/div/table/tbody/tr'):
    for c in elem.xpath("//td"):
        if(c.getchildren()): # for the <span> thing
            text = c.xpath("//span/text()")
        else:
             text = c.text

But I am unable to iterate over the "td" elements. I have been trying this whole day but of no avail!! I want to get 2003. 1.19, and -0.48.

Kindly help!

1 Answer 1

6

It looks like you have HTML, not XML. Therefore, use lxml.html, not lxml.etree to parse the data. If data.html looks like this:

<table><tbody>
<tr>
<td>2003</td>
<td><span class="positive">1.19</span> </td>
<td><span class="negative">-0.48</span> </td>
</tr>

then

import lxml.html as LH
tree = LH.parse('data.html')
print([td.text_content() for td in tree.xpath('//td')])

yields

['2003', '1.19 ', '-0.48 ']

If

for elem in tree.xpath('//*[@id="printcontent"]/div[8]/div/table/tbody/tr'):

is not returning any elems, then you need to show us enough HTML to help us debug why this XPath is not working.

Sign up to request clarification or add additional context in comments.

1 Comment

bravo! Yes I made this XML - HTML mistake

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.