0

I am trying to extract some data from a file:

<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/">
<soap:Body>
    <d2LogicalModel xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://datex2.eu/schema/2/2_0" modelBaseVersion="2">
        <exchange>
            <supplierIdentification>
                <country>nl</country>
                <nationalIdentifier>NDW-CNS</nationalIdentifier>
            </supplierIdentification>
        </exchange>
        <payloadPublication xsi:type="MeasuredDataPublication" lang="nl">
            <publicationTime>2014-12-04T06:59:55.000Z</publicationTime>
            <publicationCreator>
                <country>nl</country>
                <nationalIdentifier>NDW-CNS</nationalIdentifier>
            </publicationCreator>
            <measurementSiteTableReference id="NDW01_MT" version="662" targetClass="MeasurementSiteTable"/>
            <headerInformation>
                <confidentiality>noRestriction</confidentiality>
                <informationStatus>real</informationStatus>
            </headerInformation>
            <siteMeasurements>
                <measurementSiteReference id="GEO03_D4T-RWS_T_0317_ID_324" version="3" targetClass="MeasurementSiteRecord"/>
                <measurementTimeDefault>2014-12-04T06:58:00Z</measurementTimeDefault>
                <measuredValue index="1">
                    <measuredValue>
                        <basicData xsi:type="TravelTimeData">
                            <travelTimeType>best</travelTimeType>
                            <travelTime numberOfInputValuesUsed="100" standardDeviation="7">
                                <duration>34</duration>
                            </travelTime>
                        </basicData>
                    </measuredValue>
                </measuredValue>
            </siteMeasurements>
            <siteMeasurements>
                <measurementSiteReference id="GEO01_Z_RWSTRN054" version="1" targetClass="MeasurementSiteRecord"/>
                <measurementTimeDefault>2014-12-04T06:58:00Z</measurementTimeDefault>
                <measuredValue index="1" xsi:type="_SiteMeasurementsIndexMeasuredValue">
                    <measuredValue xsi:type="MeasuredValue">
                        <basicData xsi:type="TravelTimeData">
                            <travelTimeType>best</travelTimeType>
                            <travelTime numberOfIncompleteInputs="0" numberOfInputValuesUsed="7" standardDeviation="0.71" supplierCalculatedDataQuality="100.0">
                                <duration>56</duration>
                            </travelTime>
                        </basicData>
                    </measuredValue>
                </measuredValue>
            </siteMeasurements>
           .
           .
           .
           .
           .
           <siteMeasurements>
                <measurementSiteReference id="RWS01_MONIBAS_0091hrr0350ra0" version="1" targetClass="MeasurementSiteRecord"/>
                <measurementTimeDefault>2014-12-04T06:58:00Z</measurementTimeDefault>
                <measuredValue index="1" xsi:type="_SiteMeasurementsIndexMeasuredValue">
                    <measuredValue xsi:type="MeasuredValue">
                        <basicData xsi:type="TravelTimeData">
                            <travelTimeType>best</travelTimeType>
                            <travelTime numberOfIncompleteInputs="0">
                                <duration>23</duration>
                            </travelTime>
                        </basicData>
                    </measuredValue>
                </measuredValue>
            </siteMeasurements>
        </payloadPublication>
    </d2LogicalModel>
</soap:Body>

What i am trying to do is use Python to extract from each

             <siteMeasurements>
                <measurementSiteReference id="RWS01_MONIBAS_0091hrr0350ra0" version="1" targetClass="MeasurementSiteRecord"/>
                <measurementTimeDefault>2014-12-04T06:58:00Z</measurementTimeDefault>
                <measuredValue index="1" xsi:type="_SiteMeasurementsIndexMeasuredValue">
                    <measuredValue xsi:type="MeasuredValue">
                        <basicData xsi:type="TravelTimeData">
                            <travelTimeType>best</travelTimeType>
                            <travelTime numberOfIncompleteInputs="0">
                                <duration>23</duration>
                            </travelTime>
                        </basicData>
                    </measuredValue>
                </measuredValue>
            </siteMeasurements>

the value of attribute 'id' from 'measurementSiteReference' and the text content of 'duration'

I am using Python for this. my code so far:

import xml.etree.cElementTree as ET
tree = ET.ElementTree(file='track.xml')
root = tree.getroot()

for elem in tree.iter():
   print elem.tag, elem.attrib

But i am having difficulties extracting these values. I don't have any experience with Python.

How can i iterate through 'siteMeasurements' and get the value of 'id' attribute of measurementSiteTableReference and the text content of 'duration'

please give me some advice to help me on my way

1 Answer 1

1

You may have missing </soap:Envelope> tag at the bottom of the xml file or you may not have copy pasted. Anyway, after putting the tag in and also adding following xml tag at the top (1st line) I was able to run it.

<?xml version="1.0" encoding="UTF-8"?>

First we need to figure out what elements can we iter on.

>>> for i in root.iter():
    print i

Which gives listing as below (truncated)

<Element '{http://schemas.xmlsoap.org/soap/envelope/}Envelope' at 0x29e4170>
<Element '{http://schemas.xmlsoap.org/soap/envelope/}Body' at 0x29e4190>
|
|
<Element '{http://datex2.eu/schema/2/2_0}measurementSiteTableReference' at 0x29e4510>
|
|
<Element '{http://datex2.eu/schema/2/2_0}duration' at 0x29e4750>

Once we have these elements, we simply iter over desired elements to get their key/value pairs.

Code

import xml.etree.ElementTree as ET
data_file = 'soapData2.xml'
tree = ET.parse(data_file)
root = tree.getroot()


t1 = "{http://datex2.eu/schema/2/2_0}measurementSiteReference"
t2 = "{http://datex2.eu/schema/2/2_0}duration"

print "measurementSiteReference ", ": duration"
for e1, e2 in zip(root.iter(t1), root.iter(t2)):
   print e1.attrib['id'] , ":", e2.text

Result

>>> 
measurementSiteReference  : duration
GEO03_D4T-RWS_T_0317_ID_324 : 34
GEO01_Z_RWSTRN054 : 56
RWS01_MONIBAS_0091hrr0350ra0 : 23
>>> 
Sign up to request clarification or add additional context in comments.

1 Comment

yes,with copy paste i missed soap:envelope, your solution worked perfectly. and it was especially good because of your explanation which made me understand what is done to come to the solution.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.