0

I have an issue with ElementTree that I can't quite figure out. I've read all their documentation as well as all the information I could find on this forum. I have a couple elements/nodes that I am trying to remove using ElementTree. I don't get any errors with the following code, but when I look at the output file I wrote the changes to, the elements/nodes that I expected to be removed are still there. I have a document that looks like this:

<data>
  <config>
    <script filename="test1.txt"></script>
    <documentation filename="test2.txt"></script>
  </config>
</data>

My code looks as follows:

import xml.etree.ElementTree as ElementTree    
xmlTree = ElementTree.parse(os.path.join(sourcePath, "test.xml"))
xmlRoot = xmlTree.getroot()
for doc in xmlRoot.findall('documentation'):
     xmlRoot.remove(doc)

xmlTree.write(os.path.join(sourcePath, "testTWO.xml"))

The result is I get the following document:

<data>
  <config>
    <script filename="test1.txt" />
    <documentation filename="test2.txt" />
  </config>
</data>

What I need is something more like this. I am not stuck using ElementTree. If there is a better solution with lxml or some other library, I am all ears. I know ElementTree can be a little bit of a pain at times.

<data>
  <config>
  </config>
</data>

1 Answer 1

2

xmlRoot.findall('documentation') in your code didn't find anything, because <documentation> isn't direct child of the root element <data>. It is actually direct child of <config> :

"Element.findall() finds only elements with a tag which are direct children of the current element". [19.7.1.3. Finding interesting elements]

This is one possible way to remove all children of <config> using findall() given sample XML you posted (and assuming that the actual XML has <documentation> element closed with proper closing tag instead of closed with </script>) :

......
config = xmlRoot.find('config')

# find all children of config
for doc in config.findall('*'):
    config.remove(doc)
    # print just to make sure the element to be removed is correct
    print ElementTree.tostring(doc)
......
Sign up to request clarification or add additional context in comments.

6 Comments

Thank you for the example. I can see where I went wrong based on what you provided. However, in the initial XML example I just put a couple elements under <config>. I only want to remove the <script> and <documentation> elements and keep everything else. So I added the following code. The documentation node gets deleted but not the script. When I print ElementTree.tostring() I see that it properly finds the <script> and <documentation> elements.
Here is the code: # Remove the 'documentation' and 'script' tags from test.xml if documentation is None: pass else: config = xmlRoot.find('config') for doc in config.findall('documentation'): config.remove(doc) print ElementTree.tostring(doc)
if script is None: pass else: config = xmlRoot.find('config') for script in config.findall('script'): config.remove print ElementTree.tostring(script) xmlTree.write(os.path.join(sourcePath, "driverTWO.xml"))
Nevermind. I forgot to pass in the variable to config.remove. Added config.remove(script) and it works fine.
I would like to point out that you can use findall to find elements anywhere in an XML tree (not just direct children). findall(".//documentation") is an example. See docs.python.org/2/library/….
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.