I'm trying to examine and extract some data from an xml file using python. I'm doing this by parsing with etree then looping through the elements:
import xml.etree.ElementTree as etree
root = etree.fromstring(xml_string)
for element in root.iter():
print("%s , %s , %s" % (element.tag, element.attrib, element.text))
This works fine for some test data, but the actual xml files that I'm working with seem to contain xsd tags along with the data. Below is an example
<wdtf:observationMember>
<wdtf:TimeSeriesObservation gml:id="ts1">
<gml:description>Reading using DTW (Depth To Water) from TOC</gml:description>
<gml:name codeSpace="http://www.bom.gov.au/std/water/xml/wio0.2/feature/TimeSeriesObservation/w00066/12/A/GroundWaterLevel/">1</gml:name>
<om:procedure xlink:href="#gwTOC12" />
<om:observedProperty xlink:href="http://www.bom.gov.au/std/water/xml/wio0.2/property//bom/GroundWaterLevel_m" />
<om:featureOfInterest xlink:href="http://www.bom.gov.au/std/water/xml/wio0.2/feature/BorePipeSamplingInterval/w00066/12" />
<wdtf:metadata>
<wdtf:TimeSeriesObservationMetadata>
<wdtf:regulationProperty>Reg200806.s3.2a</wdtf:regulationProperty>
<wdtf:status>validated</wdtf:status>
</wdtf:TimeSeriesObservationMetadata>
</wdtf:metadata>
<wdtf:result>
<wdtf:TimeSeries>
<wdtf:defaultInterpolationType>InstVal</wdtf:defaultInterpolationType>
<wdtf:defaultUnitsOfMeasure>m</wdtf:defaultUnitsOfMeasure>
<wdtf:defaultQuality>quality-A</wdtf:defaultQuality>
<wdtf:timeValuePair time="1915-12-09T12:00:00+10:00">51.82</wdtf:timeValuePair>
<wdtf:timeValuePair time="1917-12-18T12:00:00+10:00">41.38</wdtf:timeValuePair>
<wdtf:timeValuePair time="1924-05-23T12:00:00+10:00">21.95</wdtf:timeValuePair>
<wdtf:timeValuePair time="1988-02-02T12:00:00+10:00">7.56</wdtf:timeValuePair>
</wdtf:TimeSeries>
</wdtf:result>
</wdtf:TimeSeriesObservation>
</wdtf:observationMember>
Useing this xml in the code above causes etree to return an error:
Traceback (most recent call last):
File "xml_test2.py", line 38, in <module>
root = etree.fromstring(xml_string)
File "<string>", line 124, in XML
ParseError: unbound prefix: line 1, column 4
Is there a different parser I should be using? Or can I remove the xsc tags some how?
Thanks
root = etree.parse("myfile.xsd").getroot()"instead ofroot = etree.fromstring(xml_string).