1

I have an xml, small part of it looks like this:

<?xml version="1.0" ?>
<i:insert xmlns:i="urn:com:xml:insert" xmlns="urn:com:xml:data">
  <data>
    <image imageId="1"></image>
    <content>Content</content>
  </data>
</i:insert>

When i parse it using ElementTree and save it to a file i see following:

<ns0:insert xmlns:ns0="urn:com:xml:insert" xmlns:ns1="urn:com:xml:data">
  <ns1:data>
    <ns1:image imageId="1"></ns1:image>
    <ns1:content>Content</ns1:content>
  </ns1:data>
</ns0:insert>

Why does it change prefixes and put them everywhere? Using minidom i don't have such problem. Is it configured? Documentation for ElementTree is very poor. The problem is, that i can't find any node after such parsing, for example image - can't find it with or without namespace if i use it like {namespace}image or just image. Why's that? Any suggestions are strongly appreciated.

What i already tried:

import xml.etree.ElementTree as ET
tree = ET.parse('test.xml')
root = tree.getroot()
for a in root.findall('ns1:image'):
    print a.attrib

This returns an error and the other one returns nothing:

for a in root.findall('{urn:com:xml:data}image'):
    print a.attrib

I also tried to make namespace like this and use it:

namespaces = {'ns1': 'urn:com:xml:data'}
for a in root.findall('ns1:image', namespaces):
    print a.attrib

It returns nothing. What am i doing wrong?

1
  • Can you add the Python code which you are using to parse the XML? Commented Jan 10, 2015 at 0:25

2 Answers 2

7
+50

This snippet from your question,

for a in root.findall('{urn:com:xml:data}image'):
    print a.attrib

does not output anything because it only looks for direct {urn:com:xml:data}image children of the root of the tree.

This slightly modified code,

for a in root.findall('.//{urn:com:xml:data}image'):
    print a.attrib

will print {'imageId': '1'} because it uses .//, which selects matching subelements on all levels.

Reference: https://docs.python.org/2/library/xml.etree.elementtree.html#supported-xpath-syntax.


It is a bit annoying that ElementTree does not just retain the original namespace prefixes by default, but keep in mind that it is not the prefixes that matter anyway. The register_namespace() function can be used to set the wanted prefix when serializing the XML. The function does not have any effect on parsing or searching.

Sign up to request clarification or add additional context in comments.

Comments

0

From what I gather, it has something to do with the namespace recognition in ET.

from here http://effbot.org/zone/element-namespaces.htm

When you save an Element tree to XML, the standard Element serializer generates unique prefixes for all URI:s that appear in the tree. The prefixes usually have the form “ns” followed by a number. For example, the above elements might be serialized with the prefix ns0 for “http://www.w3.org/1999/xhtml” and ns1 for “http://effbot.org/namespace/letters”.

If you want to use specific prefixes, you can add prefix/uri mappings to a global table in the ElementTree module. In 1.3 and later, you do this by calling the register_namespace function. In earlier versions, you can access the internal table directly:

ElementTree 1.3

ET.register_namespace(prefix, uri)

ElementTree 1.2 (Python 2.5)

ET._namespace_map[uri] = prefix

Note the argument order; the function takes the prefix first, while the raw dictionary maps from URI:s to prefixes.

1 Comment

i already read it and tried this namespace registration but it didn't help.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.