Editing and duplicating xml block with elementTree

Question

I would like to edit the below XML as follows: The 'duplicateAndAddOne' block should be duplicated (name changed to 'newElements') and have all items in it incremented by one. Ideally the elements in it should not have to be read individually, but it should be done as a batch, as there will be many items.

<?xml version="1.0"?>
<data>
    <Strategy name="duplicateAndAddOne">
        <datapoint1>7</datapoint1>
        <datapoint2>9</datapoint2>
    </Strategy>
    <Strategy name="leaveMeAlone">
        <datapoint1>22</datapoint1>
        <datapoint2>23</datapoint2>
    </Strategy>
</data>

Corley Brigman · Accepted Answer · 2015-10-13 13:34:57Z

1

This seems to depend on whether you are using the built-in ElementTree, or lxml.

With lxml, you should be able to use copy:

from lxml import etree
e = etree.Element('root')
etree.SubElement(e, 'child1')
etree.SubElement(e, 'child2')

from copy import copy
f = copy(e)
f[0].tag = 'newchild1'
etree.dump(e)
<root>
  <child1/>
  <child2/>
</root>

etree.dump(f)
<root>
  <newchild1/>
  <child2/>
</root>

You can see that the new tree is actually separate from the old one; this is because lxml stores the parent in the element, and so can't reuse them - it has to create new elements for every child.

ElementTree doesn't keep the parent in the element, and so it's possible for the same element to coexist in several trees at once. As far as I can tell, there's no built-in way to force deep copying... deepcopy and element.copy() both do the exact same thing as copy - they copy the node, but then connect it to the children from the original node. So changes to the copy will change the original - not what you want.

The simplest way I've discovered to make this work properly is simply to serialize to a string, and then deserialize it again. This forces completely new elements to be created. It is pretty slow - but it also always works. Compare the following methods:

import xml.etree.ElementTree as etree
e = Element('root')
etree.SubElement(e, 'child1')
etree.SubElement(e, 'child2')

#f = copy(e)
#f[0].tag = 'newchild1'
# If you do the above, the first child of e will also be 'newchild1'
# So that won't work. 

# Simple method, slow but complete
In [194]: %timeit f = etree.fromstring(etree.tostring(e))
10000 loops, best of 3: 71.8 µs per loop

# Faster method, but you must implement the recursion - this only
# looks at a single level.
In [195]: %%timeit
   .....: f = etree.Element('newparent')
   .....: f.extend([x.copy() for x in e])
   .....:
100000 loops, best of 3: 9.49 µs per loop

This bottom method does create copies of the first-level children, and it is a lot faster than the first version. However, this only works for a single level of nesting; if any of these had children, you'd have to go down and copy those yourself as well. You may be able to write a recursive copy, and it might be faster; the places where I've done this haven't been performance-sensitive so I haven't bothered in my code. The tostring/fromstring routine is fairly inefficient, but straightforward, and always works no matter how deep the tree is.

answered Oct 13, 2015 at 13:34

Corley Brigman

12.5k5 gold badges35 silver badges41 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Nickpick Over a year ago

but how can I make a selection to only consider duplicateAndAddOne?

Corley Brigman Over a year ago

You'll have to walk through each element, and extract the 'name' element, and then decide what to do with it. although, if you have control over this XML, Strategy doesn't appear to be a good name for that section; something like DataFrame with a strategy='duplicate' attribute would seem to be more semantically consistent...

Collectives™ on Stack Overflow

Editing and duplicating xml block with elementTree

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related