27

Is there a way to define the default/unprefixed namespace in python ElementTree? This doesn't seem to work...

ns = {"":"http://maven.apache.org/POM/4.0.0"}
pom = xml.etree.ElementTree.parse("pom.xml")
print(pom.findall("version", ns))

Nor does this:

ns = {None:"http://maven.apache.org/POM/4.0.0"}
pom = xml.etree.ElementTree.parse("pom.xml")
print(pom.findall("version", ns))

This does, but then I have to prefix every element:

ns = {"mvn":"http://maven.apache.org/POM/4.0.0"}
pom = xml.etree.ElementTree.parse("pom.xml")
print(pom.findall("mvn:version", ns))

Using Python 3.5 on OSX.

EDIT: if the answer is "no", you can still get the bounty :-). I just want a definitive "no" from someone who's spent a lot of time using it.

1
  • Using ElementTree, you have to use a prefix. If you use lxml, you can use .nsmap instead of hard-coding prefixes. See stackoverflow.com/questions/14853243/… for details Commented Feb 2, 2016 at 23:44

3 Answers 3

29
+100

NOTE: for Python 3.8+ please see this answer.


There is no straight-forward way to handle the default namespaces transparently. Assigning the empty namespace a non-empty name is a common solution, as you've already mentioned:

ns = {"mvn":"http://maven.apache.org/POM/4.0.0"}
pom = xml.etree.ElementTree.parse("pom.xml")
print(pom.findall("mvn:version", ns))

Note that lxml.etree does not allow the use of empty namespaces explicitly. You would get:

ValueError: empty namespace prefix is not supported in ElementPath


You can though, make things simpler, by removing the default namespace definition while loading the XML input data:

import xml.etree.ElementTree as ET
import re
 
with open("pom.xml") as f:
    xmlstring = f.read()
 
# Remove the default namespace definition (xmlns="http://some/namespace")
xmlstring = re.sub(r'\sxmlns="[^"]+"', '', xmlstring, count=1)
 
pom = ET.fromstring(xmlstring) 
print(pom.findall("version"))
Sign up to request clarification or add additional context in comments.

6 Comments

To handle single quotes: r"""\s(xmlns="[^"]+"|\sxmlns='[^']+')"""
To fix @juloo65 answer: xmlstring = re.sub(r"""\s(xmlns="[^"]+"|xmlns='[^']+')""", '', xmlstring, count=1)
N.B.: "removing the default namespace definition while loading the XML input data" doesn't apply to using html5lib to transform HTML-serialization HTML to XHTML.
This should no longer be the accepted answer as of Python 3.8+. See stackoverflow.com/a/62398604/6705037
@delocalizer thanks, added a link to the top of the answer.
|
14

ElementTree in Python 3.8 allows empty string as a prefix, so you can declare:

ns = {'': 'http://maven.apache.org/POM/4.0.0'}

and use that as the second arg in the find* methods.

Source: https://docs.python.org/3.8/library/xml.etree.elementtree.html?highlight=xml#xml.etree.ElementTree.Element.find

Comments

3

You can retrieve the default namespace with:

namespace = pom.getroot().tag.split("}")[0]+"}"

Then when you search for elements you add it to your search path:

print(pom.findall(namespace+"version"))

Not an elegant solution, but it works.

2 Comments

Doesn't this give you the namespace of the root element? Which may or may not be the same as the default namespace?
@J.Beattie You may be correct; I might not use the terms correct.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.