-1

I have below XML file:

<annotation>
    <folder>JPEGImages</folder>
    <filename>01FQ0YY92XRX5MDWGYC2RJ1CP4.jpeg</filename>
    <path>D:\aVisionData\PVL Pilot Project\test\Annotation\JPEGImages\01FQ0YY92XRX5MDWGYC2RJ1CP4.jpeg</path>
    <source>
        <database>Unknown</database>
    </source>
    <size>
        <width>601</width>
        <height>844</height>
        <depth>3</depth>
    </size>
    <segmented>0</segmented>
    <object>
        <name>smallObject</name>
        <pose>Unspecified</pose>
        <truncated>0</truncated>
        <difficult>0</difficult>
        <bndbox>
            <xmin>329</xmin>
            <ymin>199</ymin>
            <xmax>376</xmax>
            <ymax>242</ymax>
        </bndbox>
    </object>
</annotation>

I want to remove <path> and also want to edit <source> </source> so it looks like below

<annotation>
    <folder>JPEGImages</folder>
    <filename>01FQ0YY92XRX5MDWGYC2RJ1CP4.jpeg</filename>
    <source>
        <database>objects</database>
        <annotation>custom</annotation>
        <image>custom</image>
    </source>
    <size>
        <width>601</width>
        <height>844</height>
        <depth>3</depth>
    </size>
    <segmented>0</segmented>
    <object>
        <name>smallObject</name>
        <pose>Unspecified</pose>
        <truncated>0</truncated>
        <difficult>0</difficult>
        <bndbox>
            <xmin>329</xmin>
            <ymin>199</ymin>
            <xmax>376</xmax>
            <ymax>242</ymax>
        </bndbox>
    </object>
</annotation>

To remove the <path>, I used below code:

import xml.etree.ElementTree as Et

file_path = os.path.join(inputAnnotationPath, annotation)
tr = Et.parse(file_path)
for element in tr.iter():
    for subElement in element:
        print(subElement)
        if subElement.tag == "path":
            se = subElement.get("path")
            element.remove(subElement)
tr.write(sys.stdout)

It runs fine but not able to remove path. What changes I should do to remove the path and modify source.

4
  • Possibly related. Commented Dec 28, 2021 at 9:26
  • 1
    It seems like a good job for XSLT which can be run from Python as well. Commented Dec 28, 2021 at 9:28
  • I concur with @MartinHonnen Commented Dec 28, 2021 at 14:46
  • I've run your code against the sample input XML and I can see that path is removed. I also see that source is unmodified, but there's no logic/code for that, so I'm not surprised. I'm running 3.8.9. Commented Dec 28, 2021 at 19:50

1 Answer 1

0

It's pretty simple if you can use lxml:

from lxml import etree
parser = etree.XMLParser(recover=True)
tr = etree.parse(file_path, parser=parser)

#select both <path> and <source> for removal
targets = [tr.xpath('//path')[0], tr.xpath('//source')[0]]

#select the destination for the new <source> element
destination = tr.xpath('//filename')[0]

#recreate <source>
new_source = """
     <source>
        <database>objects</database>
        <annotation>custom</annotation>
        <image>custom</image>
    </source>"""

#remove what needs to be removed
for target in targets:
    target.getparent().remove(target)

#insert the new <source> element
destination.addnext(etree.fromstring(new_source))

#save to file
with open("output.xml", "wb") as f:
    f.write(etree.tostring(tr))
Sign up to request clarification or add additional context in comments.

2 Comments

Where is tr declared? Looks like it should be the result of etree.parse(). And, nothing was said about filename.
@ZachYoung Oops! Good catch - fixed.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.