3

Here is the xml file http://www.diveintopython3.net/examples/feed.xml

My code is enter image description here

My result is enter image description here

My questions are

  1. how to remove the \n and the following white space in the text

  2. how to get the node whose text is "dive into mark", how to search the text syntax

0

1 Answer 1

2

Just call normalize-space(.) on each node.

import lxml.etree as et

xml = et.parse("feed.xml")
ns = {"ns": 'http://www.w3.org/2005/Atom'}
for n in xml.xpath("//ns:category", namespaces=ns):
    t  = n.xpath("./../ns:summary", namespaces=ns)[0]
    print(t.xpath("normalize-space(.)"))

Output:

Putting an entire chapter on one page sounds bloated, but consider this — my longest chapter so far would be 75 printed pages, and it loads in under 5 seconds… On dialup.
Putting an entire chapter on one page sounds bloated, but consider this — my longest chapter so far would be 75 printed pages, and it loads in under 5 seconds… On dialup.
Putting an entire chapter on one page sounds bloated, but consider this — my longest chapter so far would be 75 printed pages, and it loads in under 5 seconds… On dialup.
The accessibility orthodoxy does not permit people to question the value of features that are rarely useful and rarely used.
These notes will eventually become part of a tech talk on video encoding.
These notes will eventually become part of a tech talk on video encoding.
These notes will eventually become part of a tech talk on video encoding.
These notes will eventually become part of a tech talk on video encoding.
These notes will eventually become part of a tech talk on video encoding.
These notes will eventually become part of a tech talk on video encoding.
These notes will eventually become part of a tech talk on video encoding.
These notes will eventually become part of a tech talk on video encoding.

All your newlines have been removed and multiple spaces replaced with a single space.

Part two of your question is asking for the title tag as that is the only tag with the text you are looking for, but to specifically find the title with that exact text, that is simply:

xml.xpath("//ns:title[text()='dive into mark']", namespaces=ns)

If you wanted any node that contained the text, you would just replace ns:title with a wildcard:

xml.xpath("//*[text()='dive into mark']", namespaces=ns)
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.