remove newline and whitespace parse XML with python Xpath

Question

Here is the xml file http://www.diveintopython3.net/examples/feed.xml

My code is

My result is

My questions are

how to remove the \n and the following white space in the text
how to get the node whose text is "dive into mark", how to search the text syntax

Padraic Cunningham · Accepted Answer · 2016-05-25 00:07:01Z

Just call normalize-space(.) on each node.

import lxml.etree as et

xml = et.parse("feed.xml")
ns = {"ns": 'http://www.w3.org/2005/Atom'}
for n in xml.xpath("//ns:category", namespaces=ns):
    t  = n.xpath("./../ns:summary", namespaces=ns)[0]
    print(t.xpath("normalize-space(.)"))

Output:

Putting an entire chapter on one page sounds bloated, but consider this &mdash; my longest chapter so far would be 75 printed pages, and it loads in under 5 seconds&hellip; On dialup.
Putting an entire chapter on one page sounds bloated, but consider this &mdash; my longest chapter so far would be 75 printed pages, and it loads in under 5 seconds&hellip; On dialup.
Putting an entire chapter on one page sounds bloated, but consider this &mdash; my longest chapter so far would be 75 printed pages, and it loads in under 5 seconds&hellip; On dialup.
The accessibility orthodoxy does not permit people to question the value of features that are rarely useful and rarely used.
These notes will eventually become part of a tech talk on video encoding.
These notes will eventually become part of a tech talk on video encoding.
These notes will eventually become part of a tech talk on video encoding.
These notes will eventually become part of a tech talk on video encoding.
These notes will eventually become part of a tech talk on video encoding.
These notes will eventually become part of a tech talk on video encoding.
These notes will eventually become part of a tech talk on video encoding.
These notes will eventually become part of a tech talk on video encoding.

All your newlines have been removed and multiple spaces replaced with a single space.

Part two of your question is asking for the title tag as that is the only tag with the text you are looking for, but to specifically find the title with that exact text, that is simply:

xml.xpath("//ns:title[text()='dive into mark']", namespaces=ns)

If you wanted any node that contained the text, you would just replace ns:title with a wildcard:

xml.xpath("//*[text()='dive into mark']", namespaces=ns)

Collectives™ on Stack Overflow

remove newline and whitespace parse XML with python Xpath

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related