0

XPath via lxml in Python has been making me run in circles. I can't get it to extract text from an HTML table despite having what I believe to be the correct XPath. I'm using Chrome to inspect and extract the XPath, then using it in my code.

Here is the HTML table taken directly from the page:

<div id="vehicle-detail-model-specs-container">
<table id="vehicle-detail-model-specs" class="table table-striped vdp-feature-table">
    <!-- Price -->
    <tr>
                <td><strong>Price:</strong></td>
                    <td>
                            <strong id="vehicle-detail-price" itemprop="price">$ 2,210.00</strong>            </td>
            </tr>
                    <!-- VIN -->
    <tr><td><strong>VIN</strong></td><td>&nbsp;*0343</td></tr>

    <!-- MILEAGE -->
    <tr><td><strong>Mileage</strong></td><td>0&nbsp;mi</td></tr>
</table>

I'm trying to extract the Mileage. The XPath I'm using is:

//*[@id="vehicle-detail-model-specs"]/tbody/tr[3]/td[2]

And the Python code that I'm using is:

page = requests.get(URL)
tree = html.fromstring(page.content)

mileage = tree.xpath('//*[@id="vehicle-detail-model-specs"]/tbody/tr[3]/td[2]')
print mileage

Note: I've tried adding /text() to the end and I still get nothing back, just an empty list [].

What am I doing wrong and why am I not able to extract the table value from the above examples?

4

1 Answer 1

1

As Amber has pointed out, you should omit the tbody part. You use tbody in your xpath when there is no <tbody> tag in the html code for your table.

Using the html you posted, I am able to extract the mileage value with the following xpath:

tree.xpath('//*[@id="vehicle-detail-model-specs"]/tr[3]/td[2]')[0].text_content()
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.