Parse xsd with values [python]

Question

I'm trying to examine and extract some data from an xml file using python. I'm doing this by parsing with etree then looping through the elements:

import xml.etree.ElementTree as etree
    root = etree.fromstring(xml_string) 

for element in root.iter():
    print("%s , %s , %s" % (element.tag, element.attrib, element.text))

This works fine for some test data, but the actual xml files that I'm working with seem to contain xsd tags along with the data. Below is an example

<wdtf:observationMember>
  <wdtf:TimeSeriesObservation gml:id="ts1">
    <gml:description>Reading using DTW (Depth To Water) from TOC</gml:description>
    <gml:name codeSpace="http://www.bom.gov.au/std/water/xml/wio0.2/feature/TimeSeriesObservation/w00066/12/A/GroundWaterLevel/">1</gml:name>
    <om:procedure xlink:href="#gwTOC12" />
    <om:observedProperty xlink:href="http://www.bom.gov.au/std/water/xml/wio0.2/property//bom/GroundWaterLevel_m" />
    <om:featureOfInterest xlink:href="http://www.bom.gov.au/std/water/xml/wio0.2/feature/BorePipeSamplingInterval/w00066/12" />
    <wdtf:metadata>
      <wdtf:TimeSeriesObservationMetadata>
        <wdtf:regulationProperty>Reg200806.s3.2a</wdtf:regulationProperty>
        <wdtf:status>validated</wdtf:status>
      </wdtf:TimeSeriesObservationMetadata>
    </wdtf:metadata>
    <wdtf:result>
      <wdtf:TimeSeries>
        <wdtf:defaultInterpolationType>InstVal</wdtf:defaultInterpolationType>
        <wdtf:defaultUnitsOfMeasure>m</wdtf:defaultUnitsOfMeasure>
        <wdtf:defaultQuality>quality-A</wdtf:defaultQuality>
        <wdtf:timeValuePair time="1915-12-09T12:00:00+10:00">51.82</wdtf:timeValuePair>
        <wdtf:timeValuePair time="1917-12-18T12:00:00+10:00">41.38</wdtf:timeValuePair>
        <wdtf:timeValuePair time="1924-05-23T12:00:00+10:00">21.95</wdtf:timeValuePair>
        <wdtf:timeValuePair time="1988-02-02T12:00:00+10:00">7.56</wdtf:timeValuePair>
      </wdtf:TimeSeries>
    </wdtf:result>
  </wdtf:TimeSeriesObservation>
</wdtf:observationMember>

Useing this xml in the code above causes etree to return an error:

Traceback (most recent call last):
File "xml_test2.py", line 38, in <module>
root = etree.fromstring(xml_string)
File "<string>", line 124, in XML
ParseError: unbound prefix: line 1, column 4

Is there a different parser I should be using? Or can I remove the xsc tags some how?

Thanks

Hi Mike the erroro I get is:Traceback (most recent call last): File "xml_test2.py", line 38, in <module> root = etree.fromstring(xml_string) File "<string>", line 124, in XML ParseError: unbound prefix: line 1, column 4 — jprockbelly
– jprockbelly, Commented Apr 4, 2013 at 1:58
Are you loading the XML from a file? If you are you should use root = etree.parse("myfile.xsd").getroot()" instead of root = etree.fromstring(xml_string). — user849425
– user849425, Commented Apr 4, 2013 at 2:45

Petru Gardea · Accepted Answer · 2013-04-04 02:56:52Z

1

From what I can see in your post, your parser is namespace aware and is complaining that XML namespace aliases are not resolved. Assuming that <wdtf:observationMember> is your topmost element, then you have to have the following at least:

<wdtf:observationMember xmlns:wdtf="some-uri">

The same applies for all other prefixes, such as gml, om, etc.

answered Apr 4, 2013 at 2:56

Petru Gardea

21.7k2 gold badges55 silver badges67 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

jprockbelly Over a year ago

Thanks Petru, think I have it sorted out now. I was clipping the observation member out of a much larger block of xml and missed the name space

Collectives™ on Stack Overflow

Parse xsd with values [python]

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related