6

I need to write a dynamic function that finds elements on a subtree of an ATOM xml by building dynamically the XPath to the element.

To do so, I've written something like this:

    tree = etree.parse(xmlFileUrl)
    e = etree.XPathEvaluator(tree, namespaces={'def':'http://www.w3.org/2005/Atom'})
    entries = e('//def:entry')
    for entry in entries:
        mypath = tree.getpath(entry) + "/category"
        category = e(mypath)

The code above fails to find "category" because getpath() returns an XPath without namespaces, whereas the XPathEvaluator e() requires namespaces.

Although I know I can use the path and provide a namespace in the call to XPathEvaluator, I would like to know if it's possible to make getpath() return a "fully qualified" path, using all the namespaces, as this is convenient in certain cases.

(This is a spin-off question of my earlier question: Python XpathEvaluator without namespace)

3 Answers 3

5

Basically, using the standard Python's xml.etree library, a different visit function is needed. To achieve this scope you can build a modified version of iter method like this:

def etree_iter_path(node, tag=None, path='.'):
    if tag == "*":
        tag = None
    if tag is None or node.tag == tag:
        yield node, path
    for child in node:
        _child_path = '%s/%s' % (path, child.tag)
        for child, child_path in etree_iter_path(child, tag, path=_child_path):
            yield child, child_path

Then you can use this function for the iteration of the tree from the root node:

from xml.etree import ElementTree

xmldoc = ElementTree.parse(*path to xml file*)
for elem, path in etree_iter_path(xmldoc.getroot()):
    print(elem, path)
Sign up to request clarification or add additional context in comments.

Comments

2

Rather than trying to construct a full path from the root, you can evaluate XPath expression on with the entry as the base node:

tree = etree.parse(xmlFileUrl)
nsmap = {'def':'http://www.w3.org/2005/Atom'}
entries_expr = etree.XPath('//def:entry', namespaces=nsmap)
category_expr = etree.XPath('category')
for entry in entries_expr(tree):
    category = category_expr(entry)

If performance is not critical, you can simplify the code by using the .xpath() method on elements rather than pre-compiled expressions:

tree = etree.parse(xmlFileUrl)
nsmap = {'def':'http://www.w3.org/2005/Atom'}
for entry in tree.xpath('//def:entry', namespaces=nsmap):
    category = entry.xpath('category')

1 Comment

Hi there, I managed to get this now. Probably building the full path is not really helpful. Btw, doesn't the "namespaces=nsmap" need a comma before it?
2

From the docs http://lxml.de/xpathxslt.html#the-xpath-class:

ElementTree objects have a method getpath(element), which returns a structural, absolute XPath expression to find that element:

So the answer to your question is that getpath() will not return a "fully qualified" path, since otherwise there would be an argument to the function for that, you are only guarenteed that the xpath expression returned will find you that element.

You may be able to combine getpath and xpath (and Xpath class) to do what you want though.

2 Comments

Hi @jmh, thanks. So if that's true, I hope someone will be able to help with the combination of xpath and getpath. However, I'm surprised that such a function is not provided.
There are examples in the documentation.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.