0

I'm trying to examine and extract some data from an xml file using python. I'm doing this by parsing with etree then looping through the elements:

import xml.etree.ElementTree as etree
    root = etree.fromstring(xml_string) 

for element in root.iter():
    print("%s , %s , %s" % (element.tag, element.attrib, element.text))

This works fine for some test data, but the actual xml files that I'm working with seem to contain xsd tags along with the data. Below is an example

<wdtf:observationMember>
  <wdtf:TimeSeriesObservation gml:id="ts1">
    <gml:description>Reading using DTW (Depth To Water) from TOC</gml:description>
    <gml:name codeSpace="http://www.bom.gov.au/std/water/xml/wio0.2/feature/TimeSeriesObservation/w00066/12/A/GroundWaterLevel/">1</gml:name>
    <om:procedure xlink:href="#gwTOC12" />
    <om:observedProperty xlink:href="http://www.bom.gov.au/std/water/xml/wio0.2/property//bom/GroundWaterLevel_m" />
    <om:featureOfInterest xlink:href="http://www.bom.gov.au/std/water/xml/wio0.2/feature/BorePipeSamplingInterval/w00066/12" />
    <wdtf:metadata>
      <wdtf:TimeSeriesObservationMetadata>
        <wdtf:regulationProperty>Reg200806.s3.2a</wdtf:regulationProperty>
        <wdtf:status>validated</wdtf:status>
      </wdtf:TimeSeriesObservationMetadata>
    </wdtf:metadata>
    <wdtf:result>
      <wdtf:TimeSeries>
        <wdtf:defaultInterpolationType>InstVal</wdtf:defaultInterpolationType>
        <wdtf:defaultUnitsOfMeasure>m</wdtf:defaultUnitsOfMeasure>
        <wdtf:defaultQuality>quality-A</wdtf:defaultQuality>
        <wdtf:timeValuePair time="1915-12-09T12:00:00+10:00">51.82</wdtf:timeValuePair>
        <wdtf:timeValuePair time="1917-12-18T12:00:00+10:00">41.38</wdtf:timeValuePair>
        <wdtf:timeValuePair time="1924-05-23T12:00:00+10:00">21.95</wdtf:timeValuePair>
        <wdtf:timeValuePair time="1988-02-02T12:00:00+10:00">7.56</wdtf:timeValuePair>
      </wdtf:TimeSeries>
    </wdtf:result>
  </wdtf:TimeSeriesObservation>
</wdtf:observationMember>

Useing this xml in the code above causes etree to return an error:

Traceback (most recent call last):
File "xml_test2.py", line 38, in <module>
root = etree.fromstring(xml_string)
File "<string>", line 124, in XML
ParseError: unbound prefix: line 1, column 4

Is there a different parser I should be using? Or can I remove the xsc tags some how?

Thanks

4
  • 1
    Please post the actual error message. Commented Apr 4, 2013 at 1:42
  • Hi Mike the erroro I get is:Traceback (most recent call last): File "xml_test2.py", line 38, in <module> root = etree.fromstring(xml_string) File "<string>", line 124, in XML ParseError: unbound prefix: line 1, column 4 Commented Apr 4, 2013 at 1:58
  • Are you loading the XML from a file? If you are you should use root = etree.parse("myfile.xsd").getroot()" instead of root = etree.fromstring(xml_string). Commented Apr 4, 2013 at 2:45
  • ok will do, thanks mike Commented Apr 4, 2013 at 4:26

1 Answer 1

1

From what I can see in your post, your parser is namespace aware and is complaining that XML namespace aliases are not resolved. Assuming that <wdtf:observationMember> is your topmost element, then you have to have the following at least:

<wdtf:observationMember xmlns:wdtf="some-uri">

The same applies for all other prefixes, such as gml, om, etc.

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks Petru, think I have it sorted out now. I was clipping the observation member out of a much larger block of xml and missed the name space

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.