Parsing html page with lxml in python

Question

i want to parse this Xpath query with lxml in python.

.//*[@id='content_top']/article/div/table/tbody/tr[5]/td/p/text()

I checked the xpath query in Firepath (the firebug extension for xpath),and it works,but my python code show me nothing. Here's the source.

from lxml import html
import requests

page = requests.get("http://www.scienzeetecnologie.uniparthenope.it/avvisi.html")
tree = html.fromstring(page.text)
avvisi = tree.xpath(".//*[@id='content_top']/article/div/table/tbody/tr[5]/td/p/text()")
print(avvisi)

The output is a "[]".

Community · Accepted Answer · 2017-05-23 12:21:49Z

1

There is no actual <tbody> element in the source html, its just an element in the DOM added by the HTML parser.

The firebug actually displays the DOM (and I am guessing firepath , which is a firebug extension works on this DOM (rather than the source html)).

For a more detailed explanation on <tbody> and why firebug displays it , check the answers to the SO question - Why does firebug add <tbody> to <table>? or this question - Why do browsers insert tbody element into table elements?

In your case, removing the <tbody> from the xpath, would make it work , Example -

avvisi = tree.xpath(".//*[@id='content_top']/article/div/table/tr[5]/td/p/text()")

edited May 23, 2017 at 12:21

CommunityBot

11 silver badge

answered Aug 2, 2015 at 14:14

Anand S Kumar

91.4k18 gold badges196 silver badges179 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

cdm Over a year ago

THANKS MAN YOU MADE MY DAY! :) But why in the list ouput i have this strange chars? like \xa0 or similar? There is a way to avoid printing them?

Anand S Kumar Over a year ago

Print each element on the list separately, when you are printing the list as such, you are getting the repr() output of the strings.

Anand S Kumar Over a year ago

Something like - for i in avvisi: print(i)

Anand S Kumar Over a year ago

Glad I could be helpful.

Collectives™ on Stack Overflow

Parsing html page with lxml in python

1 Answer 1

4 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

4 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related