how to get xpath of all elements in xml file with default namespace using python?

Question

I wanted to get xpath of each element in xml file.

xml file:

<root 
xmlns="http://www.w3.org/TR/html4/"
xmlns:h="http://www.w3schools.com/furniture">

<table>
  <tr>
    <h:td>Apples</h:td>
    <h:td>Bananas</h:td>
  </tr>
</table>
</root>

python code: Since null prefix in default namespace is not allowed,i used my own prefix for that.

from lxml import etree 
root=etree.parse(open("MyData.xml",'r'))
ns={'df': 'http://www.w3.org/TR/html4/', 'types': 'http://www.w3schools.com/furniture'}
for e in root.iter():
   b=root.getpath(e)
   print b
   r=root.xpath(b,namespaces=ns)
   #i need both b and r here

the xpath is like this(output b)

/*
/*/*[1]
/*/*[1]/*[1]
/*/*[1]/*[1]/h:td

i can't get the xpath correctly for elements having default namespace,it shows as * for those elements name. How to get xpath correctly?

Keith Hall · Accepted Answer · 2016-07-18 06:54:38Z

3

You could use getelementpath, which always returns the elements in Clark notation, and replace the namespaces manually:

x = """
<root 
xmlns="http://www.w3.org/TR/html4/"
xmlns:h="http://www.w3schools.com/furniture">

<table>
  <tr>
    <h:td>Apples</h:td>
    <h:td>Bananas</h:td>
  </tr>
</table>
</root>
"""

from lxml import etree 
root = etree.fromstring(x).getroottree()
ns = {'df': 'http://www.w3.org/TR/html4/', 'types': 'http://www.w3schools.com/furniture'}
for e in root.iter():
    path = root.getelementpath(e)
    root_path = '/' + root.getroot().tag
    if path == '.':
        path = root_path
    else:
        path = root_path + '/' + path
    for ns_key in ns:
        path = path.replace('{' + ns[ns_key] + '}', ns_key + ':')
    print(path)
    r = root.xpath(path, namespaces=ns)
    print(r)

Obviously, this example shows that getelementpath returns paths relative to the root node, like . and dt:table instead of /df:root and /df:root/df:table, so we use the tag of the root element to manually construct the full path.

Output:

/df:root
[<Element {http://www.w3.org/TR/html4/}root at 0x37f5348>]
/df:root/df:table
[<Element {http://www.w3.org/TR/html4/}table at 0x44bdb88>]
/df:root/df:table/df:tr
[<Element {http://www.w3.org/TR/html4/}tr at 0x37fa7c8>]
/df:root/df:table/df:tr/types:td[1]
[<Element {http://www.w3schools.com/furniture}td at 0x44bdac8>]
/df:root/df:table/df:tr/types:td[2]
[<Element {http://www.w3schools.com/furniture}td at 0x44bdb88>]

answered Jul 18, 2016 at 6:54

Keith Hall

16.2k3 gold badges59 silver badges75 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

mariz Over a year ago

--the code works fine but instead of reading xml from string ,i want to read it from an xml file like open("MyData.xml",'r').i don't know the exact syntax for root = etree.fromstring(x).getroottree() to support for file reading.how to do it?

Keith Hall Over a year ago

@mariz to parse a file called MyData.xml, you can replace root = etree.fromstring(x).getroottree() with root = etree.parse('MyData.xml') More info at: lxml.de/parsing.html

Collectives™ on Stack Overflow

how to get xpath of all elements in xml file with default namespace using python?

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related