0

I am having two issues with parsing an XML file. I want to only return one set of properties I.E only the property values under the first Process and I want to return the second Source under the second Process. When I use my code it returns the Source under the first Sources and the first Source under the second Sources but I cannot get the second Source to return.

The XML file looks like:

<!-- The description of the process -->
<Description>"This is a description"</Description>

<!-- info on process to be run -->
<Process expectFailure="false">
    <Code>Import</Code>
    <Sources>
        <Source>"Test Data"</Source>
    </Sources>
    <Destination>Buffered</Destination>
    <Properties>
        <Property code="format" value="CC"/>
        <Property code="Input" value="10N"/>
        <Property code="Method" value="BASIC"/>
        <Ppoperty code="Resolution" value="5"/>
        <Property code="Convention" value="LEFT"/>
        <Property code="Bounding" value="BUFFERED"/>
    </Properties>
</Process>

<!-- info on second process to be run (compare) -->
<Process>
    <Code>SurfaceCompare</Code>
    <Sources>
        <Source>expectedOutput</Source>
        <Source>Buffered</Source>
    </Sources>
    <Properties>
        <Property code="compare_designated" value="true"/>
        <Property code="compare_metadata" value="true"/>
        <Property code="metadata_type" value="OTHER"/>
    </Properties>
</Process>

and the code looks like

from xml.etree import ElementTree

tree = ElementTree.parse("XML_example.xml")

description = tree.findtext("Description")
print(description)

for process in tree.findall('Process'):
    for source in process.findall('Sources'):
        source_text = source.findtext('Source')
        print(source_text)

#returns everything
for property in process.iter('Property'):
    print(property.attrib.get('code'))
    print(property.attrib.get('value'))

for process in tree.findall('Process'):
    for source in process.findall('Sources'):
        source = source.findtext('Source')
        print(source)

I've tried a lot of different ways of using the findall, find, iter, get, getiter methods. I am sure I am missing something but it has been a long day and for the life of me I can't see what I am missing.

There is also the ability to change how the XML is set up but I know there must be a way to solve this question and it is gnawing at me.

Sample proper output for sources:

"Test Data"
expectedOutput
buffered

Sample proper output 1 for properties:

format
CC
Input
10N
Method
BASIC
Convention
LEFT
Bounding
BUFFERED

Sample proper output 2:

compare_designated 
true
compare_metadata 
true
metadata_type 
OTHER

1 Answer 1

1

The most simplest way to achieve what you want is to use find or findall with the path, iter works well with tag name but in your case using path will be more suitable.

Here is one way of doing it, by the way, your sample is missing a root element thus I've added in my code:

import xml.etree.ElementTree as ET
from StringIO import StringIO

s = '''<!-- The description of the process -->
<Description>"This is a description"</Description>

<!-- info on process to be run -->
<Process expectFailure="false">
    <Code>Import</Code>
    <Sources>
        <Source>"Test Data"</Source>
    </Sources>
    <Destination>Buffered</Destination>
    <Properties>
        <Property code="format" value="CC"/>
        <Property code="Input" value="10N"/>
        <Property code="Method" value="BASIC"/>
        <Ppoperty code="Resolution" value="5"/>
        <Property code="Convention" value="LEFT"/>
        <Property code="Bounding" value="BUFFERED"/>
    </Properties>
</Process>

<!-- info on second process to be run (compare) -->
<Process>
    <Code>SurfaceCompare</Code>
    <Sources>
        <Source>expectedOutput</Source>
        <Source>Buffered</Source>
    </Sources>
    <Properties>
        <Property code="compare_designated" value="true"/>
        <Property code="compare_metadata" value="true"/>
        <Property code="metadata_type" value="OTHER"/>
    </Properties>
</Process>'''

# once you've parsed the file, you need to **getroot()**
tree = ET.parse(StringIO('<root>' + s + '</root>')).getroot()

For example, you can use path to get from first Process[1] -> Properties -> Property, using findall you can get to all Property nodes, and iterate them:

# and iterate all Property nodes, and get their attributes like this
for p in tree.findall('./Process[1]/Properties/Property'):
    print p.attrib # to get code/value, use p.attrib.get('code') etc.

Thus you get the first Process/Properties and all Property's attributes:

{'code': 'format', 'value': 'CC'}
{'code': 'Input', 'value': '10N'}
{'code': 'Method', 'value': 'BASIC'}
{'code': 'Convention', 'value': 'LEFT'}
{'code': 'Bounding', 'value': 'BUFFERED'}

Another example, to get just the second Process, second Source text, using path is quite straight forward with just find, too:

print tree.find('./Process[2]/Sources/Source[2]').text
Buffered

I hope you get the idea of how to use them, remember to get a single node you use find, to return a list of nodes, you use findall, hope this helps.

Sign up to request clarification or add additional context in comments.

4 Comments

Thank you I don't have time to try it now but it looks promising. Will try when I get home.
@erics12512354, no worries take your time and I'm sure once you get the idea of how the path works you'll have problem solving the issue :)
It works. I didn't realize you could add positional arguments to the file paths and I was missing the ./ and when I tried that. Thanks again.
@erics12512354, glad it helps. Yes it's a little similar to xpath :)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.