Delete Element from XML file using python

Question

I have been trying to delete the structuredBody element (which is within a component element) within the following Document, but my code seems to not work.

The structure of the XML source file simplified:

<ClinicalDocument xmlns="urn:hl7-org:v3" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
...
...
  <component>
    <structuredBody>
    ...
    ...
    </structuredBody>
  </component>
</ClinicalDocument>

Here is the code I'm using:

import xml.etree.ElementTree as ET
from lxml import objectify, etree

cda_tree = etree.parse('ELGA-023-Entlassungsbrief_aerztlich_EIS-FullSupport.xml')
cda_root = cda_tree.getroot()
for e in cda_root: 
    ET.register_namespace("", "urn:hl7-org:v3")

for node in cda_tree.xpath('//component/structuredBody'):
    node.getparent().remove(node)

cda_tree.write('newXML.xml')

Whenever I run the code, the newXML.xml file still has the structuredBody element.

Thanks in advance!

Please post a minimal reproducible example. Include a well-formed snippet of the XML as part of your question. The XML file you linked to also contains a lot of irrelevant data which have no bearing on your question. — user5386938
– user5386938, Commented Apr 14, 2021 at 17:48
@JustinEzequiel I provided also the simplified structure of the xml file, hope this helps — igotPOWA
– igotPOWA, Commented Apr 14, 2021 at 17:52
@larsks I'm using ET to delete the namespaces which are added automatically by the .parse function and with etree I'm trying to delete the structuredBody element — igotPOWA
– igotPOWA, Commented Apr 14, 2021 at 17:57
But you're not using ET. You're using lxml, so you need to manage namespace the way lxml expects. — larsks
– larsks, Commented Apr 14, 2021 at 17:58

larsks · Accepted Answer · 2021-04-14 17:58:30Z

2

Based on your most recent edit, I think you'll find the problem is that your for loop isn't matching any nodes. Your document doesn't contain any elements named component or structuredBody. The xmlns="urn:hl7-org:v3" declaration on the root element mean that all elements in the document exist by default in that particular namespace, so you need to use that namespace when matching elements:

from lxml import objectify, etree

cda_tree = etree.parse('data.xml')
cda_root = cda_tree.getroot()

ns = {
    'hl7': 'urn:hl7-org:v3',
}

for node in cda_tree.xpath('//hl7:component/hl7:structuredBody', namespaces=ns):
    node.getparent().remove(node)

cda_tree.write('newXML.xml')

With the above code, if the input looks like this:

<ClinicalDocument
    xmlns="urn:hl7-org:v3"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
  <component>
    <structuredBody>
    ...
    ...
    </structuredBody>
  </component>
</ClinicalDocument>

The output looks like:

<ClinicalDocument xmlns="urn:hl7-org:v3" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
  <component>
    </component>
</ClinicalDocument>

answered Apr 14, 2021 at 17:58

larsks

318k50 gold badges474 silver badges482 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

igotPOWA Over a year ago

Thank you a lot @larsks, now it is deleting the structuredBody element and the namespaces are handled much better now. Your help is really appreciated!

larsks Over a year ago

If this answer has resolved your problem, consider clicking the checkmark to the left of the answer in order to mark it as "accepted".

Collectives™ on Stack Overflow

Delete Element from XML file using python

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related