0

I have tried this

url = 'http://test.ir/'
content = s.get(url).content
tree = html.fromstring(content)
print [e.text_content() for e in tree.xpath('//div[@class="grouptext"]/text()[not(self:div)]')]

As you can see in the picture I want the selected part: enter image description here

When I use

print [e.text_content() for e in tree.xpath('//div[@class="grouptext"]')]

The result shows me the selected part and the content of <div class="grouptext"> as well.

1 Answer 1

1

Assuming that you just want the text() of the first occurence of the <div> tag you have to be more specific in your XPath expression. Either you tell the system that you explictly want the first one by adding [1]

print [e.text_content() for e in tree.xpath('//div[@class="grouptext"][1]')]

or you could select it by filtering for the style parameter:

print [e.text_content() for e in tree.xpath('//div[@class="grouptext" and @style]')]

You will have to decide which is the better way to go. This will depend on how the <div> tags show up in your XML in a more general case.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.