How to get xpath from root to particular element in python while parsing xml

Question

I want to list all the elements path in xml with respect to their root. for example

<A>
   <B>
     <C>Name</C>
     <D>Name</D>
   </B>
</A>

So i want to list them as :-

A/B/C
A/B/D

I am able to parse xml using "Element" Object of python but not able to extract xpath from it. Any help?

does it mean there should also be: A/B

zero
– zero

2018-04-27 06:52:32 +00:00
Commented Apr 27, 2018 at 6:52 — zero
– zero, Commented Apr 27, 2018 at 6:52
no only absolute path from root

placebo
– placebo

2018-04-27 07:05:00 +00:00
Commented Apr 27, 2018 at 7:05 — placebo
– placebo, Commented Apr 27, 2018 at 7:05

Virtuoz · Accepted Answer · 2019-02-01 14:29:59Z

1

One can construct a parent map of the parsed tree and then use it to construct a needed XPath:

import xml.etree.ElementTree as parser

def get_parent_map(root):
    return {c:p for p in root.iter() for c in p}

def extract_text_info(root, original_root):
    parent_map = get_parent_map(original_root)

    for child in root:
        if child.text is not None and len(child.text.strip()) > 0:
            c = child
            arr = []
            while c != original_root:
                arr.append(c.tag)
                c = parent_map[c]
            arr.append(original_root.tag)

            print('/'.join(arr[::-1]))
            print(child.text)

        extract_text_info(child, original_root)

Then we have

xml = """<A>
       <B>
         <C>Name</C>
         <D>Name</D>
       </B>
     </A> """

root = parser.fromstring(xml)
extract_text_info(root, root)

> A/B/C
> Name
> A/B/D
> Name

edited Feb 1, 2019 at 14:29

answered Feb 1, 2019 at 14:13

Virtuoz

9241 gold badge8 silver badges15 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

placebo · Accepted Answer · 2018-04-27 07:38:22Z

1

One Of the ways I figured out is through code.

 import xml.etree.ElementTree as ET


def parseXML(root,sm):
    sm = sm + "/" + root.tag[root.tag.rfind('}')+1:]
    for child in root:
      parseXML(child,sm)
    if len(list(root)) == 0:
      print(sm)

tree = ET.parse('test.xml')
root = tree.getroot()
parseXML(root,"")

Don't know if there is any inbuilt function for the same.

edited Apr 27, 2018 at 7:38

answered Apr 27, 2018 at 7:04

placebo

911 silver badge8 bronze badges

2 Comments

Jon Clements Over a year ago

Umm... are you able to use the lxml library? My xpath is rusty and I'm sure there's a better way, but you could try something like: ['/'.join(a.tag for a in el.xpath('.//ancestor::*')) for el in tree.xpath('//*[not(child::*)]')] - eg - you find all the leaf nodes (those with no children) - requery to get their complete list of ancestors, then join the node names. That'll give you ['A/B/C', 'A/B/D'] on your sample data.

Prashanth Over a year ago

Is there a way to get the value within the element as well while you are iterating ? But this code definitely helped iterate all the elements and get their paths ! Beautiful....

zero · Accepted Answer · 2018-04-27 07:50:57Z

sample.html

<A>
   <B>
     <C>Name1</C>
     <D>Name2</D>
   </B>
</A>

parse.py

from bs4 import BeautifulSoup

def get_root_elements(path_to_file):
    soup = BeautifulSoup(open(path_to_file), 'lxml')
    all_elements = soup.find_all()

    count_element_indices = [len(list(a.parents)) for a in all_elements]

    absolute_roots_index = min(
        (index for index, element in enumerate(count_element_indices)
            if element == max(count_element_indices)
        )
    )

    return all_elements[absolute_roots_index:]

def get_path(element):
    to_remove = ['[document]', 'body', 'html']
    path = [element.name] + [e.name for e in element.parents if e.name not in to_remove]

    return ' / '.join(path[::-1])

Python Shell

In [1]: file = 'path/to/sample.html'

In [2]: run parse.py

In [3]: roots = get_root_elements(file)

In [4]: print(roots)
[<c>Name1</c>, <d>Name2</d>]

In [4]: for root in roots:
   ...:    print(get_path(root))
a / b / c
a / b / d

Collectives™ on Stack Overflow

How to get xpath from root to particular element in python while parsing xml

3 Answers 3

Comments

2 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related