I'm trying to build a script to read an xml file. This is my first time parsing an xml and i'm doing it using python with xml.etree.ElementTree. The section of the file that i would like to process looks like:
<component>
<section>
<id root="42CB916B-BB58-44A0-B8D2-89B4B27F04DF" />
<code code="34089-3" codeSystem="2.16.840.1.113883.6.1" codeSystemName="LOINC" displayName="DESCRIPTION SECTION" />
<title mediaType="text/x-hl7-title+xml">DESCRIPTION</title>
<text>
<paragraph>Renese<sup>®</sup> is designated generically as polythiazide, and chemically as 2<content styleCode="italics">H</content>-1,2,4-Benzothiadiazine-7-sulfonamide, 6-chloro-3,4-dihydro-2-methyl-3-[[(2,2,2-trifluoroethyl)thio]methyl]-, 1,1-dioxide. It is a white crystalline substance, insoluble in water but readily soluble in alkaline solution.</paragraph>
<paragraph>Inert Ingredients: dibasic calcium phosphate; lactose; magnesium stearate; polyethylene glycol; sodium lauryl sulfate; starch; vanillin. The 2 mg tablets also contain: Yellow 6; Yellow 10.</paragraph>
</text>
<effectiveTime value="20051214" />
</section>
</component>
<component>
<section>
<id root="CF5D392D-F637-417C-810A-7F0B3773264F" />
<code code="42229-5" codeSystem="2.16.840.1.113883.6.1" codeSystemName="LOINC" displayName="SPL UNCLASSIFIED SECTION" />
<title mediaType="text/x-hl7-title+xml">ACTION</title>
<text>
<paragraph>The mechanism of action results in an interference with the renal tubular mechanism of electrolyte reabsorption. At maximal therapeutic dosage all thiazides are approximately equal in their diuretic potency. The mechanism whereby thiazides function in the control of hypertension is unknown.</paragraph>
</text>
<effectiveTime value="20051214" />
</section>
</component>
The full file can be downloaded from:
Here my code:
import xml.etree.ElementTree as ElementTree
import re
with open("ABD6ECF0-DC8E-41DE-89F2-1E36ED9D6535.xml") as f:
xmlstring = f.read()
# Remove the default namespace definition (xmlns="http://some/namespace")
xmlstring = re.sub(r'\sxmlns="[^"]+"', '', xmlstring, count=1)
tree = ElementTree.fromstring(xmlstring)
for title in tree.iter('title'):
print(title.text)
So far I'm able to print the titles but I would like to print also the corresponding text that is captured in the tags.
I have tried this:
for title in tree.iter('title'):
print(title.text)
for paragraph in title.iter('paragraph'):
print(paragraph.text)
But I have no output from the paragraph.text
Doing
for title in tree.iter('title'):
print(title.text)
for paragraph in tree.iter('paragraph'):
print(paragraph.text)
I print the text of the paragraphs but (obviously) it is printed all together for each title found in the xml structure.
I would like to find a way to 1) identify the title; 2) print the corresponding paragraph(s). How can I do it?