4

I wanted to get xpath of each element in xml file.

xml file:

<root 
xmlns="http://www.w3.org/TR/html4/"
xmlns:h="http://www.w3schools.com/furniture">

<table>
  <tr>
    <h:td>Apples</h:td>
    <h:td>Bananas</h:td>
  </tr>
</table>
</root>

python code: Since null prefix in default namespace is not allowed,i used my own prefix for that.

from lxml import etree 
root=etree.parse(open("MyData.xml",'r'))
ns={'df': 'http://www.w3.org/TR/html4/', 'types': 'http://www.w3schools.com/furniture'}
for e in root.iter():
   b=root.getpath(e)
   print b
   r=root.xpath(b,namespaces=ns)
   #i need both b and r here

the xpath is like this(output b)

/*
/*/*[1]
/*/*[1]/*[1]
/*/*[1]/*[1]/h:td

i can't get the xpath correctly for elements having default namespace,it shows as * for those elements name. How to get xpath correctly?

1 Answer 1

3

You could use getelementpath, which always returns the elements in Clark notation, and replace the namespaces manually:

x = """
<root 
xmlns="http://www.w3.org/TR/html4/"
xmlns:h="http://www.w3schools.com/furniture">

<table>
  <tr>
    <h:td>Apples</h:td>
    <h:td>Bananas</h:td>
  </tr>
</table>
</root>
"""

from lxml import etree 
root = etree.fromstring(x).getroottree()
ns = {'df': 'http://www.w3.org/TR/html4/', 'types': 'http://www.w3schools.com/furniture'}
for e in root.iter():
    path = root.getelementpath(e)
    root_path = '/' + root.getroot().tag
    if path == '.':
        path = root_path
    else:
        path = root_path + '/' + path
    for ns_key in ns:
        path = path.replace('{' + ns[ns_key] + '}', ns_key + ':')
    print(path)
    r = root.xpath(path, namespaces=ns)
    print(r)

Obviously, this example shows that getelementpath returns paths relative to the root node, like . and dt:table instead of /df:root and /df:root/df:table, so we use the tag of the root element to manually construct the full path.

Output:

/df:root
[<Element {http://www.w3.org/TR/html4/}root at 0x37f5348>]
/df:root/df:table
[<Element {http://www.w3.org/TR/html4/}table at 0x44bdb88>]
/df:root/df:table/df:tr
[<Element {http://www.w3.org/TR/html4/}tr at 0x37fa7c8>]
/df:root/df:table/df:tr/types:td[1]
[<Element {http://www.w3schools.com/furniture}td at 0x44bdac8>]
/df:root/df:table/df:tr/types:td[2]
[<Element {http://www.w3schools.com/furniture}td at 0x44bdb88>]
Sign up to request clarification or add additional context in comments.

2 Comments

--the code works fine but instead of reading xml from string ,i want to read it from an xml file like open("MyData.xml",'r').i don't know the exact syntax for root = etree.fromstring(x).getroottree() to support for file reading.how to do it?
@mariz to parse a file called MyData.xml, you can replace root = etree.fromstring(x).getroottree() with root = etree.parse('MyData.xml') More info at: lxml.de/parsing.html

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.