23

Assume that I've the following XML which I want to modify using Python's ElementTree:

<root xmlns:prefix="URI">
  <child company:name="***"/>
  ...
</root> 

I'm doing some modification on the XML file like this:

import xml.etree.ElementTree as ET
tree = ET.parse('filename.xml')
# XML modification here
# save the modifications
tree.write('filename.xml')

Then the XML file looks like:

<root xmlns:ns0="URI">
  <child ns0:name="***"/>
  ...
</root>

As you can see, the namepsace prefix changed to ns0. I'm aware of using ET.register_namespace() as mentioned here.

The problem with ET.register_namespace() is that:

  1. You need to know prefix and URI
  2. It can not be used with default namespace.

e.g. If the xml looks like:

<root xmlns="http://uri">
    <child name="name">
    ...
    </child>
</root>

It will be transfomed to something like:

<ns0:root xmlns:ns0="http://uri">
    <ns0:child name="name">
    ...
    </ns0:child>
</ns0:root>

As you can see, the default namespace is changed to ns0.

Is there any way to solve this problem with ElementTree?

8
  • Possible duplicate of xml.etree.ElementTree - Trouble setting xmlns = '...' Commented Jan 30, 2019 at 14:14
  • The dup link uses clearly ET.register_namespace(.... Edit your Question to minimal reproducible example to show how you use it. Commented Jan 30, 2019 at 19:23
  • 1
    @stovfl It's not about preserving the namespace and didn't help me. The name space should not be hard coded, it can be xmlns:prefix="URI" with any prefix and URI. Commented Jan 31, 2019 at 18:29
  • The only way to preserve the namespace prefix with ElementTree is by using register_namespace(). If you don't like that, try lxml instead. Commented Jan 31, 2019 at 18:42
  • 1
    See stackoverflow.com/a/42372404/407651 for a way to get the namespaces in the document. Commented Feb 1, 2019 at 6:13

1 Answer 1

44

ElementTree will replace those namespaces' prefixes that are not registered with ET.register_namespace. To preserve a namespace prefix, you need to register it first before writing your modifications on a file. The following method does the job and registers all namespaces globally,

def register_all_namespaces(filename):
    namespaces = dict([node for _, node in ET.iterparse(filename, events=['start-ns'])])
    for ns in namespaces:
        ET.register_namespace(ns, namespaces[ns])

This method should be called before ET.parse method, so that the namespaces will remain as unchanged,

import xml.etree.ElementTree as ET
register_all_namespaces('filename.xml')
tree = ET.parse('filename.xml')
# XML modification here
# save the modifications
tree.write('filename.xml')
Sign up to request clarification or add additional context in comments.

5 Comments

This solution is much better than I have seen on many other questions for the same topic. Thanks for sharing it.
does this mean the xml needs to be parsed twice? or can i somehow get the ElementTree out of this process, as i do it?
@Starwarswii Yes, if you want more control on that I think you can use XMLPullParser with start-ns event, fetching namespaces and then calling ET.register_namespace.
thank you for this answer. I was pulling my hair out with my namespaces getting replaced after a simple tweak to the XML.
It does not matter if register_namespace comes before or after ET.parse. register_namespace only affects serialization, not parsing.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.