6

Using python import lxml I am able to print a list of the path for every element recursively:

from lxml import etree
root = etree.parse(xml_file)
for e in root.iter():
    path = root.getelementpath(e)
    print(path)

Results:

TreatmentEpisodes
TreatmentEpisodes/TreatmentEpisode
TreatmentEpisodes/TreatmentEpisode/SourceRecordIdentifier
TreatmentEpisodes/TreatmentEpisode/FederalTaxIdentifier
TreatmentEpisodes/TreatmentEpisode/ClientSourceRecordIdentifier
etc.

Note: I am working with this XSD: https://www.myflfamilies.com/service-programs/samh/155-2/155-2-v14/schemas/TreatmentEpisodeDataset.xsd

I want to do the same thing using import xml.etree.ElementTree as ET ...but ElementTree does not seem to have an equivalent function to lxml getelementpath().

I've read the docs. I've googled for days. I've experimented with XPath. I've guessed using iter() and tried "getpath()", "Element.getpath()", etc. hoping to discover an undocumented feature. Fail.

Perhaps I am experiencing an extreme case of "user error" and please forgive me if this is a duplicate.

I thought I found the answer here: Get Xpath dynamically using ElementTree getpath() but the XPathEvaluator only seems to operate on a 'known' element - it doesn't have an option for "give me everything".

Here is what I tried:

import xml.etree.ElementTree as ET
tree = etree.parse(xml_file)
for entry in tree.xpath('//TreatmentEpisode'):
    print(entry)

Results:

<Element TreatmentEpisode at 0xffff8f8c8a00>

What I was hoping for:

TreatmentEpisodes/TreatmentEpisode

...however, even if I received what I hoped for, I am still not sure how to obtain the full path for every element. As I understand the XPath docs, they only operate on 'known' element names. i.e. tree.xpath() seems to require the element name to be known beforehand.

10
  • It sounds like you did research and attempted code to solve it ... now provide at least what you "experimented with XPath", your code - even if failing: minimal reproducible example. So we can see how to adjust. Commented Jul 1, 2021 at 18:48
  • 1
    Fair. I can say this: I thought I found the answer here: stackoverflow.com/questions/13136334/… but the XPathEvaluator only seems to operate on a 'known' element - it doesn't have an option for "give me everything". However, I will put together more examples of what I tried and edit my question. Commented Jul 1, 2021 at 18:52
  • @hc_dev question updated with example attempt. Commented Jul 1, 2021 at 20:31
  • Shouldn’t be too hard to write a function to build the path of an element. Did you try that? Commented Jul 1, 2021 at 20:51
  • I take that back - hadn’t realised there isn’t a parent attribute :-( However this shows a way to build a child->parent dictionary which can easily be the basis of getting the element path stackoverflow.com/questions/2170610/… Commented Jul 1, 2021 at 21:05

1 Answer 1

7

Start from:

import xml.etree.ElementTree as et

An interesting way to solve your problem is to use iterparse - an iterative parser contained in ElementTree.

It is able to report e.g. each start and end event, for each element parsed. For details search the Web for documentation / examples of iterparse.

The idea is to:

  1. Start with an empty list as the path.
  2. At the start event, append the element name to path and print the full path gathered so far.
  3. At the end event, drop the last element from path.

You can even wrap this code in a generator function:

def pathGen(fn):
    path = []
    it = et.iterparse(fn, events=('start', 'end'))
    for evt, el in it:
        if evt == 'start':
            path.append(el.tag)
            yield '/'.join(path)
        else:
            path.pop()

Now, when you run:

for pth in pathGen('Input.xml'):
    print(pth)

you will get a printout of full paths of all elements in your source file, something like:

TreatmentEpisodes
TreatmentEpisodes/TreatmentEpisode
TreatmentEpisodes/TreatmentEpisode/SourceRecordIdentifier
TreatmentEpisodes/TreatmentEpisode/FederalTaxIdentifier
TreatmentEpisodes/TreatmentEpisode/ClientSourceRecordIdentifier
TreatmentEpisodes/TreatmentEpisode
TreatmentEpisodes/TreatmentEpisode/SourceRecordIdentifier
TreatmentEpisodes/TreatmentEpisode/FederalTaxIdentifier
TreatmentEpisodes/TreatmentEpisode/ClientSourceRecordIdentifier
...
Sign up to request clarification or add additional context in comments.

2 Comments

I would upvote your answer but I still don't have the minimum reputation to do so!
For future readers of this post, Valdi_Bo's answer works and also Davide Brunato's answer in this post: stackoverflow.com/questions/13136334/…

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.