Extracting the value of a specific HTML element using XPath in Python

Question

I have tried this

url = 'http://test.ir/'
content = s.get(url).content
tree = html.fromstring(content)
print [e.text_content() for e in tree.xpath('//div[@class="grouptext"]/text()[not(self:div)]')]

As you can see in the picture I want the selected part: enter image description here

When I use

print [e.text_content() for e in tree.xpath('//div[@class="grouptext"]')]

The result shows me the selected part and the content of <div class="grouptext"> as well.

Marcus Rickert · Accepted Answer · 2014-09-30 19:16:51Z

1

Assuming that you just want the text() of the first occurence of the <div> tag you have to be more specific in your XPath expression. Either you tell the system that you explictly want the first one by adding [1]

print [e.text_content() for e in tree.xpath('//div[@class="grouptext"][1]')]

or you could select it by filtering for the style parameter:

print [e.text_content() for e in tree.xpath('//div[@class="grouptext" and @style]')]

You will have to decide which is the better way to go. This will depend on how the <div> tags show up in your XML in a more general case.

answered Sep 30, 2014 at 19:16

Marcus Rickert

4,2463 gold badges26 silver badges29 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Extracting the value of a specific HTML element using XPath in Python

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related