Python script to remove all comments from XML file

Question

I am trying to build a python script that will take in an XML document and remove all of the comment blocks from it.

I tried something along the lines of:

tree = ElementTree()
tree.parse(file)
commentElements = tree.findall('//comment()')

for element in commentElements:
    element.parentNode.remove(element)

Doing this yields a weird error from python: "KeyError: '()'

I know there are ways to easily edit the file using other methods ( like sed ), but I have to do it in a python script.

'//comment()' does not seem to be a valid search path format and is causing the KeyError. Can you please include that XML sample and expand on the pattern you are trying to catch? — jdi
– jdi, Commented May 3, 2012 at 18:27
comment() is an XPath node test that is not supported by ElementTree. Try lxml, which has full support for XPath 1.0. — mzjn
– mzjn, Commented May 3, 2012 at 18:35

mzjn · Accepted Answer · 2012-05-04 09:51:34Z

12

comment() is an XPath node test that is not supported by ElementTree.

You can use comment() with lxml. This library is quite similar to ElementTree and it has full support for XPath 1.0.

Here is how you can remove comments with lxml:

from lxml import etree

XML = """<root>
  <!-- COMMENT 1 -->
  <x>TEXT 1</x>
  <y>TEXT 2 <!-- COMMENT 2 --></y>
</root>"""

tree = etree.fromstring(XML)

comments = tree.xpath('//comment()')

for c in comments:
    p = c.getparent()
    p.remove(c)

print etree.tostring(tree)

Output:

<root>
  <x>TEXT 1</x>
  <y>TEXT 2 </y>
</root>

edited May 4, 2012 at 9:51

answered May 3, 2012 at 18:51

mzjn

51.5k16 gold badges139 silver badges265 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

ctjctj2 · Accepted Answer · 2013-08-29 17:13:52Z

8

Use strip_tags() from lxml.etree

from lxml import etree
XML = """<root>
  <!-- COMMENT 1 -->
  <x>TEXT 1</x>
  <y>TEXT 2 <!-- COMMENT 2 --></y>
  </root>"""

tree = etree.fromstring(XML)
print etree.tostring(tree)
etree.strip_tags(tree,etree.Comment)
print etree.tostring(tree)

Output:

<root>
<!-- COMMENT 1 -->
<x>TEXT 1</x>
<y>TEXT 2 <!-- COMMENT 2 --></y>
</root>
<root>

<x>TEXT 1</x>
<y>TEXT 2 </y>
</root>

answered Aug 29, 2013 at 17:13

ctjctj2

1893 silver badges8 bronze badges

Comments

Community · Accepted Answer · 2017-05-23 12:08:59Z

6

The same as

https://stackoverflow.com/a/3317008/1458574

from lxml import etree
import sys

XML = open(sys.argv[1]).read()
parser =  etree.XMLParser(remove_comments=True)
tree= etree.fromstring(XML, parser = parser)
print etree.tostring(tree)

edited May 23, 2017 at 12:08

CommunityBot

11 silver badge

answered Nov 12, 2013 at 21:51

user1458574

1611 silver badge2 bronze badges

1 Comment

mzjn Over a year ago

remove_comments=True works fine, but it's not used in the linked answer. So why do you say that it is "the same"?

daveoncode · Accepted Answer · 2012-10-06 12:05:00Z

3

This is the solution I implemented using minidom:

 def removeCommentNodes(self):
        for tag in self.dom.getElementsByTagName("*"):
            for n in tag.childNodes:
                if n.nodeType is dom.Node.COMMENT_NODE:
                    n.parentNode.removeChild(n)

In practice I first retrieve all the tags in the xml, then for each tag I look for comment nodes and if found I remove them. (self.dom is a reference to the parsed xml)

answered Oct 6, 2012 at 12:05

daveoncode

19.8k19 gold badges108 silver badges163 bronze badges

Collectives™ on Stack Overflow

Python script to remove all comments from XML file

4 Answers 4

Comments

Comments

1 Comment

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Comments

Comments

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related