1

Environment: Python 2.6.5, Eclipse SDK 3.7.1, Pydev 2.3

I am trying to parse and change values in XML data in Python using xml.dom.minidom and I'm having an issue with blank text nodes.

When I parse an XML file into a DOM object and then convert it back to a string using toxml(), the closing "Description" tags get messed up after all the blank text nodes.

Does anyone know the problem is?

Contents of issue.py

from xml.dom import minidom  
xml_dom_object = minidom.parse('news_shows.xml')  
main_node = xml_dom_object.getElementsByTagName('NewsShows')[0]  
xml_string = main_node.toxml()  
print xml_string  

Contents of news_shows.xml (notice the two blank Text nodes):

<NewsShows Planet="Earth" Language="English" Year="2012">
<NewsShow ShowName="The_Young_Turks">
 <Description Detail="Best_show_of_all_time_according_to_many">True</Description>
 <Description Detail="The_only_source_of_truth"></Description>
 <Description Detail="Three_hours_of_truth_per_day">True</Description>
</NewsShow>
<NewsShow ShowName="The_Rachel_Maddow_Show">
<Description Detail="Pretty_great_as_well">True</Description>
<Description Detail="Sucks_badly">False</Description>
<Description Detail="Conveys_more_information_than_TYT"></Description>
</NewsShow>
</NewsShows>

Output of the script (notice the 2 "Description" tags that are messed up):

<NewsShows Language="English" Planet="Earth" Year="2012">
<NewsShow ShowName="The_Young_Turks">
 <Description Detail="Best_show_of_all_time_according_to_many">True</Description>
 <Description Detail="The_only_source_of_truth"/>
 <Description Detail="Three_hours_of_truth_per_day">True</Description>
</NewsShow>
<NewsShow ShowName="The_Rachel_Maddow_Show">
<Description Detail="Pretty_great_as_well">True</Description>
<Description Detail="Sucks_badly">False</Description>
<Description Detail="Conveys_more_information_than_TYT"/>
</NewsShow>

2 Answers 2

1

Below is code snippet from source "python-3.2.3.amd64\Lib\xml\dom\minidom.py".

def writexml(self, writer, indent="", addindent="", newl=""):
    # indent = current indentation
    # addindent = indentation to add to higher levels
    # newl = newline string
    writer.write(indent+"<" + self.tagName)

    attrs = self._get_attributes()
    a_names = sorted(attrs.keys())

    for a_name in a_names:
        writer.write(" %s=\"" % a_name)
        _write_data(writer, attrs[a_name].value)
        writer.write("\"")
    if self.childNodes:
        writer.write(">")
        if (len(self.childNodes) == 1 and
            self.childNodes[0].nodeType == Node.TEXT_NODE):
            self.childNodes[0].writexml(writer, '', '', '')
        else:
            writer.write(newl)
            for node in self.childNodes:
                node.writexml(writer, indent+addindent, addindent, newl)
            writer.write(indent)
        writer.write("</%s>%s" % (self.tagName, newl))
    else:
        writer.write("/>%s"%(newl))

According to the function, if the "self" variable (which is the node to be written into XML) has no "childNodes", the writer will write a self-closing tag.

Sign up to request clarification or add additional context in comments.

1 Comment

@Sumeet Barai If you do not expect this, the only way is modified this source file and recompile.
0

Is this actually causing a problem somewhere? From all that I know about xml, the strings <tag></tag> and <tag/> are equivalent.

1 Comment

Thanks!! It seems that minidom removes blank Text Nodes when it parses the XML (or something like that).. I was trying to change the value of all Text Nodes, but the ones that were blank were getting skipped since minidom had removed them... I solved it by checking if a Text Node exists and if not, I create one with the new value and add it to the structure using createTextNode() and appendChild().

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.