0

i want to parse this Xpath query with lxml in python.

.//*[@id='content_top']/article/div/table/tbody/tr[5]/td/p/text()

I checked the xpath query in Firepath (the firebug extension for xpath),and it works,but my python code show me nothing. Here's the source.

from lxml import html
import requests

page = requests.get("http://www.scienzeetecnologie.uniparthenope.it/avvisi.html")
tree = html.fromstring(page.text)
avvisi = tree.xpath(".//*[@id='content_top']/article/div/table/tbody/tr[5]/td/p/text()")
print(avvisi)

The output is a "[]".

1 Answer 1

1

There is no actual <tbody> element in the source html, its just an element in the DOM added by the HTML parser.

The firebug actually displays the DOM (and I am guessing firepath , which is a firebug extension works on this DOM (rather than the source html)).

For a more detailed explanation on <tbody> and why firebug displays it , check the answers to the SO question - Why does firebug add <tbody> to <table>? or this question - Why do browsers insert tbody element into table elements?


In your case, removing the <tbody> from the xpath, would make it work , Example -

avvisi = tree.xpath(".//*[@id='content_top']/article/div/table/tr[5]/td/p/text()")
Sign up to request clarification or add additional context in comments.

4 Comments

THANKS MAN YOU MADE MY DAY! :) But why in the list ouput i have this strange chars? like \xa0 or similar? There is a way to avoid printing them?
Print each element on the list separately, when you are printing the list as such, you are getting the repr() output of the strings.
Something like - for i in avvisi: print(i)
Glad I could be helpful.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.