0

I have been trying to delete the structuredBody element (which is within a component element) within the following Document, but my code seems to not work.

The structure of the XML source file simplified:

<ClinicalDocument xmlns="urn:hl7-org:v3" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
...
...
  <component>
    <structuredBody>
    ...
    ...
    </structuredBody>
  </component>
</ClinicalDocument>

Here is the code I'm using:

import xml.etree.ElementTree as ET
from lxml import objectify, etree

cda_tree = etree.parse('ELGA-023-Entlassungsbrief_aerztlich_EIS-FullSupport.xml')
cda_root = cda_tree.getroot()
for e in cda_root: 
    ET.register_namespace("", "urn:hl7-org:v3")

for node in cda_tree.xpath('//component/structuredBody'):
    node.getparent().remove(node)

cda_tree.write('newXML.xml')

Whenever I run the code, the newXML.xml file still has the structuredBody element.

Thanks in advance!

5
  • Please post a minimal reproducible example. Include a well-formed snippet of the XML as part of your question. The XML file you linked to also contains a lot of irrelevant data which have no bearing on your question. Commented Apr 14, 2021 at 17:48
  • @JustinEzequiel I provided also the simplified structure of the xml file, hope this helps Commented Apr 14, 2021 at 17:52
  • 1
    Why are you importing both lxml and xml.etree? Commented Apr 14, 2021 at 17:53
  • @larsks I'm using ET to delete the namespaces which are added automatically by the .parse function and with etree I'm trying to delete the structuredBody element Commented Apr 14, 2021 at 17:57
  • But you're not using ET. You're using lxml, so you need to manage namespace the way lxml expects. Commented Apr 14, 2021 at 17:58

1 Answer 1

2

Based on your most recent edit, I think you'll find the problem is that your for loop isn't matching any nodes. Your document doesn't contain any elements named component or structuredBody. The xmlns="urn:hl7-org:v3" declaration on the root element mean that all elements in the document exist by default in that particular namespace, so you need to use that namespace when matching elements:

from lxml import objectify, etree

cda_tree = etree.parse('data.xml')
cda_root = cda_tree.getroot()

ns = {
    'hl7': 'urn:hl7-org:v3',
}

for node in cda_tree.xpath('//hl7:component/hl7:structuredBody', namespaces=ns):
    node.getparent().remove(node)

cda_tree.write('newXML.xml')

With the above code, if the input looks like this:

<ClinicalDocument
    xmlns="urn:hl7-org:v3"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
  <component>
    <structuredBody>
    ...
    ...
    </structuredBody>
  </component>
</ClinicalDocument>

The output looks like:

<ClinicalDocument xmlns="urn:hl7-org:v3" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
  <component>
    </component>
</ClinicalDocument>
Sign up to request clarification or add additional context in comments.

2 Comments

Thank you a lot @larsks, now it is deleting the structuredBody element and the namespaces are handled much better now. Your help is really appreciated!
If this answer has resolved your problem, consider clicking the checkmark to the left of the answer in order to mark it as "accepted".

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.