0

I am trying to Parse an XML file using elemenTree of Python. The xml file is like below:

<App xmlns="test attribute">
    <name>sagar</name>
</App>

Parser Code:

from xml.etree.ElementTree import ElementTree
from xml.etree.ElementTree import Element
import xml.etree.ElementTree as etree
def parser():
    eleTree = etree.parse('app.xml')
    eleRoot = eleTree.getroot()
    print("Tag:"+str(eleRoot.tag)+"\nAttrib:"+str(eleRoot.attrib))
if __name__ == "__main__":
    parser()

Output:

[sagar@linux Parser]$ python test.py
Tag:{test attribute}App  <------------- It should print only "App"
Attrib:{}

When I remove "xmlns" attribute or rename "xmlns" attribute to something else the eleRoot.tag is printing correct value. Why can't element tree unable to parse the tags properly when I have "xmlns" attribute in the tag. Am I missing some pre-requisite to parse an XML of this format using element tree?

4
  • I'd guess that xmlns attribute gets special handling, since it defines the namespace the tag comes from. Commented Jul 28, 2017 at 7:41
  • @Blckknght what is the special handling i need to do to make my code work. Commented Jul 28, 2017 at 8:50
  • The trouble isn't your code, but the xml. A small gotcha, that I just found here is that when an element has an xml namespace defined it's qualified name is "{namespace_url}Tag". Thus in your example you have defined the default namespace of "test attribute" which means that the fully qualified name of your App element is in fact "{test attribute}App", which is what you get out. I'll be modifying my answer below to accommodate this information. Commented Jul 28, 2017 at 9:02
  • I found one solution to parse the subtags:stackoverflow.com/questions/21531111/… Commented Jul 28, 2017 at 9:17

1 Answer 1

2

Your xml uses the attribute xmlns, which is trying to define a default xml namespace. Xml namespaces are used to solve naming conflicts, and require a valid URI for their value, as such the value of "test attribute" is invalid, which appears to be troubling the parsing of your xml by etree.

For more information on xml namespaces see XML Namespaces at W3 Schools.


Edit:

After looking into the issue further it appears that the fully qualified name of an element when using a python's ElementTree has the form {namespace_url}tag_name. This means that, as you defined the default namespace of "test attribute", the fully qualified name of your "App" tag is infact {test attribute}App, which is what you're getting out of your program.

Source

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.