0

I want to merge certain sub elements of xml file together. The following is the format I have:

 <?xml version='1.0' encoding='ISO-8859-1'?><?xml-stylesheet type='text/xsl' href='image_metadata_stylesheet.xsl'?><dataset><name>imglab dataset</name><comment>Created by imglab tool.</comment><images>
<image file='/home/user126043/Documents/testimages/9935.jpg'>
<box top='329' left='510' width='385' height='534'>
<label>Pirelli
</label></box></image>
<image file='/home/user126043/Documents/testimages/9935.jpg'>
<box top='360' left='113' width='440' height='147'>
<label>Pirelli
</label></box></image>
<image file='/home/user126043/Documents/testimages/9921.jpg'>
<box top='329' left='510' width='385' height='534'>
<label>Pirelli
</label></image>
</images></dataset>

In the above xml I have the box coordinates of image 99.jpg specified twice which I want to merge into one. I want to remove the <image> tag that appears repeatively for the same image and want to merge all the box coordinates for every single image within its own image tags. I have never worked with XML and hence I am not sure if the definitions that I use is right here or not. The desired output is:

<?xml version='1.0' encoding='ISO-8859-1'?><?xml-stylesheet type='text/xsl' href='image_metadata_stylesheet.xsl'?><dataset><name>imglab dataset</name><comment>Created by imglab tool.</comment><images>
    <image file='/home/user126043/Documents/testimages/9935.jpg'>
    <box top='329' left='510' width='385' height='534'>
    <label>Pirelli
    </label></box>
    <box top='360' left='113' width='440' height='147'>
    <label>Pirelli
    </label></box></image>
    <image file='/home/user126043/Documents/testimages/9921.jpg'>
    <box top='329' left='510' width='385' height='534'>
    <label>Pirelli
    </label></image>
    </images></dataset>
1

1 Answer 1

2

You can try with module xml.etree.ElementTree :

import xml.etree.ElementTree as ET
tree = ET.parse('dataset.xml')
root = tree.getroot()
file_dict = dict()
for image in root.iter('image'):    
    file_str = image.get('file')    
    if file_str in file_dict:
        root.find('images').remove(image) #remove the duplicate one
        root.find('images').find("./image[@file='"+file_str+"']").append(image.find('box')) #append duplicated subelement to merge with same image element
    else:
        file_dict[file_str]=image
print(ET.tostring(root))

The new root will be:

<dataset><images>
<image file="/home/user126043/Documents/testimages/9941.jpg">
<box height="147" left="113" top="360" width="440">
<label>Pirelli
</label></box></image>
<image file="/home/user126043/Documents/testimages/99.jpg">
<box height="276" left="247" top="160" width="228">
<label>Pirelli
</label></box><box height="276" left="247" top="439" width="506">
<label>Pirelli
</label></box></image>
</images></dataset>
Sign up to request clarification or add additional context in comments.

7 Comments

Many thanks...I tried using the code...but getting the following error.. Traceback (most recent call last): File "<stdin>", line 4, in <module> File "/usr/lib/python2.7/xml/etree/ElementTree.py", line 337, in remove self._children.remove(element) ValueError: list.remove(x): x not in list
are you sure you are using the same code, please note that root[0].remove(image) not root.remove(image)
can you print out your root[0] with print(ET.tostring(root[0])) before the remove?
Thanks a ton for your message...the following is the output<name>imglab dataset</name> Traceback (most recent call last): File "clrxml2.py", line 9, in <module> root[0].remove(image) #remove the duplicate one File "/usr/lib/python2.7/xml/etree/ElementTree.py", line 337, in remove self._children.remove(element) ValueError: list.remove(x): x not in list
Not sure if the error is because of the name tag....I am updating the xml again for your reference
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.