Null Text Node issue with xml.dom.minidom in Python

Question

Environment: Python 2.6.5, Eclipse SDK 3.7.1, Pydev 2.3

I am trying to parse and change values in XML data in Python using xml.dom.minidom and I'm having an issue with blank text nodes.

When I parse an XML file into a DOM object and then convert it back to a string using toxml(), the closing "Description" tags get messed up after all the blank text nodes.

Does anyone know the problem is?

Contents of issue.py

from xml.dom import minidom  
xml_dom_object = minidom.parse('news_shows.xml')  
main_node = xml_dom_object.getElementsByTagName('NewsShows')[0]  
xml_string = main_node.toxml()  
print xml_string

Contents of news_shows.xml (notice the two blank Text nodes):

<NewsShows Planet="Earth" Language="English" Year="2012">
<NewsShow ShowName="The_Young_Turks">
 <Description Detail="Best_show_of_all_time_according_to_many">True</Description>
 <Description Detail="The_only_source_of_truth"></Description>
 <Description Detail="Three_hours_of_truth_per_day">True</Description>
</NewsShow>
<NewsShow ShowName="The_Rachel_Maddow_Show">
<Description Detail="Pretty_great_as_well">True</Description>
<Description Detail="Sucks_badly">False</Description>
<Description Detail="Conveys_more_information_than_TYT"></Description>
</NewsShow>
</NewsShows>

Output of the script (notice the 2 "Description" tags that are messed up):

<NewsShows Language="English" Planet="Earth" Year="2012">
<NewsShow ShowName="The_Young_Turks">
 <Description Detail="Best_show_of_all_time_according_to_many">True</Description>
 <Description Detail="The_only_source_of_truth"/>
 <Description Detail="Three_hours_of_truth_per_day">True</Description>
</NewsShow>
<NewsShow ShowName="The_Rachel_Maddow_Show">
<Description Detail="Pretty_great_as_well">True</Description>
<Description Detail="Sucks_badly">False</Description>
<Description Detail="Conveys_more_information_than_TYT"/>
</NewsShow>

Dmitry · Accepted Answer · 2012-07-12 08:05:59Z

1

Below is code snippet from source "python-3.2.3.amd64\Lib\xml\dom\minidom.py".

def writexml(self, writer, indent="", addindent="", newl=""):
    # indent = current indentation
    # addindent = indentation to add to higher levels
    # newl = newline string
    writer.write(indent+"<" + self.tagName)

    attrs = self._get_attributes()
    a_names = sorted(attrs.keys())

    for a_name in a_names:
        writer.write(" %s=\"" % a_name)
        _write_data(writer, attrs[a_name].value)
        writer.write("\"")
    if self.childNodes:
        writer.write(">")
        if (len(self.childNodes) == 1 and
            self.childNodes[0].nodeType == Node.TEXT_NODE):
            self.childNodes[0].writexml(writer, '', '', '')
        else:
            writer.write(newl)
            for node in self.childNodes:
                node.writexml(writer, indent+addindent, addindent, newl)
            writer.write(indent)
        writer.write("</%s>%s" % (self.tagName, newl))
    else:
        writer.write("/>%s"%(newl))

According to the function, if the "self" variable (which is the node to be written into XML) has no "childNodes", the writer will write a self-closing tag.

edited Jul 12, 2012 at 8:05

Dmitry

3,2392 gold badges28 silver badges35 bronze badges

answered Jul 12, 2012 at 2:29

Hausen Zheng

2041 silver badge7 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Hausen Zheng Over a year ago

@Sumeet Barai If you do not expect this, the only way is modified this source file and recompile.

Daenyth · Accepted Answer · 2012-07-13 16:19:18Z

0

Is this actually causing a problem somewhere? From all that I know about xml, the strings <tag></tag> and <tag/> are equivalent.

edited Jul 13, 2012 at 16:19

answered Jul 12, 2012 at 1:59

Daenyth

37.8k15 gold badges92 silver badges130 bronze badges

1 Comment

Sumeet Barai Over a year ago

Thanks!! It seems that minidom removes blank Text Nodes when it parses the XML (or something like that).. I was trying to change the value of all Text Nodes, but the ones that were blank were getting skipped since minidom had removed them... I solved it by checking if a Text Node exists and if not, I create one with the new value and add it to the structure using createTextNode() and appendChild().

Collectives™ on Stack Overflow

Null Text Node issue with xml.dom.minidom in Python

2 Answers 2

1 Comment

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related