40

This XML file is named example.xml:

<?xml version="1.0"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">

  <modelVersion>14.0.0</modelVersion>
  <groupId>.com.foobar.flubber</groupId>
  <artifactId>uberportalconf</artifactId>
  <version>13-SNAPSHOT</version>
  <packaging>pom</packaging>
  <name>Environment for UberPortalConf</name>
  <description>This is the description</description>    
  <properties>
      <birduberportal.version>11</birduberportal.version>
      <promotiondevice.version>9</promotiondevice.version>
      <foobarportal.version>6</foobarportal.version>
      <eventuberdevice.version>2</eventuberdevice.version>
  </properties>
  <!-- A lot more here, but as it is irrelevant for the problem I have removed it -->
</project>

If I load example.xml and parse it with ElementTree I can see its namespace is http://maven.apache.org/POM/4.0.0.

>>> from xml.etree import ElementTree
>>> tree = ElementTree.parse('example.xml')
>>> print tree.getroot()
<Element '{http://maven.apache.org/POM/4.0.0}project' at 0x26ee0f0>

I have not found a method to call to get just the namespace from an Element without resorting to parsing the str(an_element) of an Element. It seems like there got to be a better way.

1
  • do you know how to use the find method in this cases? it didnt work here... Commented May 5, 2012 at 3:48

10 Answers 10

35

This is a perfect task for a regular expression.

import re

def namespace(element):
    m = re.match(r'\{.*\}', element.tag)
    return m.group(0) if m else ''
Sign up to request clarification or add additional context in comments.

7 Comments

After fighting for a while with this issue, this is the best solution I found. I can't believe that the API don't get you a way to ask for the namespace and, at the same time, it doesn't return the attribute 'xmlns' when doing 'rootElement.keys()'. Sure there is a good reason for that but I can't find it at this moment.
add r before regular exp please make this answer perfect.
@Jiu thank you so much. I can't believe I missed that.
to get namespace without curly braces included: re.match(r'\{(.*)\}', element.tag).group(1)
After adding r before the regex the \s are redundant.
|
32

The namespace should be in Element.tag right before the "actual" tag:

>>> root = tree.getroot()
>>> root.tag
'{http://maven.apache.org/POM/4.0.0}project'

To know more about namespaces, take a look at ElementTree: Working with Namespaces and Qualified Names.

Comments

13

I am not sure if this is possible with xml.etree, but here is how you could do it with lxml.etree:

>>> from lxml import etree
>>> tree = etree.parse('example.xml')
>>> tree.xpath('namespace-uri(.)')
'http://maven.apache.org/POM/4.0.0'

7 Comments

I get unresolved import: etree using Python 2.7.2 in Windows. xpath wasn´t available as a method when using xml.etree and if I use find() (which supports xpath expressions) the 'namespace-uri(.)' statement still doesn´t work.
this is exactly what i was looking for, see pr on gh
This has been the best solution that I've seen. I normally use xmlstarlet but I may switch now.
for lxml a simpler way to get the namespace is tree.getroot().nsmap
@Jona: I'd assume that using None is a way to address the default namespace, ie the one which is declared without a prefix.
|
11

Without using regular expressions:

>>> root
<Element '{http://www.google.com/schemas/sitemap/0.84}urlset' at 0x2f7cc10>

>>> root.tag.split('}')[0].strip('{')
'http://www.google.com/schemas/sitemap/0.84'

1 Comment

a similar answer root.tag[1:root.tag.index('}')]
2

The lxml.xtree library's element has a dictionary called nsmap, which shows all the namespace that are in use in the current tag scope.

>>> item = tree.getroot().iter().next()
>>> item.nsmap
{'md': 'urn:oasis:names:tc:SAML:2.0:metadata'}

Comments

2

The short answer is:

ElementTree._namspace_map[ElementTree._namspace_map.values().index('')]

but only if you have been calling

ElementTree.register_namespace(prefix,uri)

in response to every event=="start-ns" received while iterating through the result of

ET.iterparse(...) 

and you registered for "start-ns"

The answer the question "what is the default namespace?", it is necessary to clarify two points:

(1) XML specifications say that the default namespace is not necessarily global throughout the tree, rather the default namespace can be re-declared at any element under root, and inherits downwards until meeting another default namespace re-declaration.

(2) The ElementTree module can (de facto) handle XML-like documents which have no root default namespace, -if- they have no namespace use anywhere in the document. (* there may be less strict conditions, e.g., that is "if" and not necessarily "iff").

It's probably also worth considering "what do you want it for?" Consider that XML files can be semantically equivalent, but syntactically very different. E.g., the following three files are semantically equivalent, but A.xml has one default namespace declaration, B.xml has three, and C.xml has none.

A.xml:
<a xlmns="http://A" xlmns:nsB0="http://B0" xlmns:nsB1="http://B1">
     <nsB0:b/>
     <nsB1:b/>
</a>

B.xml:
<a xlmns="http://A">
     <b xlmns="http://B0"/>
     <b xlmns="http://B1"/>
</a>

C.xml:
<{http://A}a>
     <{http://B0}b/>
     <{http://B1}b/>
</a>

The file C.xml is the canonical expanded syntactical representation presented to the ElementTree search functions.

If you are certain a priori that there will be no namespace collisions, you can modify the element tags while parsing as discussed here: Python ElementTree module: How to ignore the namespace of XML files to locate matching element when using the method "find", "findall"

Comments

1

I think it will be easier to take a look at the attributes:

>>> root.attrib
{'{http://www.w3.org/2001/XMLSchema-instance}schemaLocation':
   'http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd'}

2 Comments

Certainly easier than parsing str(the_element). But I guess parsing the_element.tag is even a bit easier. As I am only interested in the namespace. What do you think?
I think that @RikPoggi's answer seems the best one (actually, I upvoted it). In fact, getting the namespace should be as easy as re.search('\{(.*)\}', the_element.tag).group(1). With my answer it looks you could use the_element.attrib.values()[0].split()[0], but, indeed, it doesn't look so much straightforward and it isn't guaranteed that you won't get any other attributes in the future.
0

Here is my solution on ElementTree 3.9+,

def get_element_namespaces(filename, element):
    namespace = []
    for key, value in ET.iterparse(filename, events=['start', 'start-ns']):
        print(key, value)
        if key == 'start-ns':
            namespace.append(value)
        else:
            if ET.tostring(element) == ET.tostring(value):
                return namespace
            namespace = []
    return namespaces

This would return an array of [prefix:URL] tuples like this:

[('android', 'http://schemas.android.com/apk/res/android'), ('tools', 'http://schemas.android.com/tools')]

1 Comment

This can't run because you have typos and no context in there.
-1

combining some of the answers above, I think the shortest code is

theroot = tree.getroot()
theroot.attrib[theroot.keys()[0]]

1 Comment

This is not accurate, because the xmlns might not be the first attribute of the root. In fact, I'm currently trying to parse a TCX file and the xmlns isn't showing up as an attribute of the root at all.
-1

This is how you can get all root level namespaces. Credit goes to Google's AI Overview

import xml.etree.ElementTree as ET

xml_data = """<?xml version="1.0"?>
<root xmlns:ns1="http://example.com/ns1" xmlns:ns2="http://example.com/ns2">
    <ns1:element1>Text 1</ns1:element1>
    <ns2:element2>Text 2</ns2:element2>
</root>
"""

# Parse the XML string incrementally and capture namespace declarations
namespaces = dict([node for _, node in ET.iterparse(xml_data, events=['start-ns'])])

# Print the collected namespaces
for prefix, uri in namespaces.items():
    print(f"Prefix: '{prefix}', URI: '{uri}'")```

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.