3

I'm puzzled by minidom parser handling of empty element, as shown in following code section.

import xml.dom.minidom

doc = xml.dom.minidom.parseString('<value></value>')
print doc.firstChild.nodeValue.__repr__()
# Out: None
print doc.firstChild.toxml()
# Out: <value/>

doc = xml.dom.minidom.Document()
v = doc.appendChild(doc.createElement('value'))
v.appendChild(doc.createTextNode(''))
print v.firstChild.nodeValue.__repr__()
# Out: ''
print doc.firstChild.toxml()
# Out: <value></value>

How can I get consistent behavior? I'd like to receive empty string as value of empty element (which IS what I put in XML structure in the first place).

3 Answers 3

4

Cracking open xml.dom.minidom and searching for "/>", we find this:

# Method of the Element(Node) class.
def writexml(self, writer, indent="", addindent="", newl=""):
    # [snip]
    if self.childNodes:
        writer.write(">%s"%(newl))
        for node in self.childNodes:
            node.writexml(writer,indent+addindent,addindent,newl)
        writer.write("%s</%s>%s" % (indent,self.tagName,newl))
    else:
        writer.write("/>%s"%(newl))

We can deduce from this that the short-end-tag form only occurs when childNodes is an empty list. Indeed, this seems to be true:

>>> doc = Document()
>>> v = doc.appendChild(doc.createElement('v'))
>>> v.toxml()
'<v/>'
>>> v.childNodes
[]
>>> v.appendChild(doc.createTextNode(''))
<DOM Text node "''">
>>> v.childNodes
[<DOM Text node "''">]
>>> v.toxml()
'<v></v>'

As pointed out by Lloyd, the XML spec makes no distinction between the two. If your code does make the distinction, that means you need to rethink how you want to serialize your data.

xml.dom.minidom simply displays something differently because it's easier to code. You can, however, get consistent output. Simply inherit the Element class and override the toxml method such that it will print out the short-end-tag form when there are no child nodes with non-empty text content. Then monkeypatch the module to use your new Element class.

Sign up to request clarification or add additional context in comments.

2 Comments

That's my point exactly. XML spec defines two forms as equivalent, but minidom treats <v></v> as '' if created at runtime, and yet parses <v></v> to Element "v" without TextElemet child node.
I followed your advice to change my approach on data serialization. I'll try JSON, as it better fits my needs. Thanks for help.
1
value = thing.firstChild.nodeValue or ''

2 Comments

Unfortunately, this does not solve my problem. In my code, I call method replaceWholeText upon TextElement in XML document. If I previously stored empty string in that TextElement, it would disappear next time XML file is parsed, and I would be unable to call method replaceWholeText. I could rebuild that element if it's not there, but that would be a very ugly hack.
What do you mean "rebuild the element"? It exists, its value just happens to be None instead of ''.
1

Xml spec does not distinguish these two cases.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.