1

I have a complex xml I'm trying to extract data from.

<?xml version="1.0" ?>
<root xmlns="something.something.com">
    <Save>
        <AdditionalInfo>
            <Name></Name>
            <Time></Time>
            <UtilityVersion></UtilityVersion>
            <XMLVersion></XMLVersion>
            <PluginName></PluginName>
            <ClassName></ClassName>
        </AdditionalInfo>
        <Data>
            <session>
                <xyDataObjects>
                    <xyData Key="'info'" ObjectType="moreinfo" Type="evenmoreinfo">
                        <axis1QuantityType ObjectType="guesswhat" Type="info!">
                            <label></label>
                            <type></type>
                        </axis1QuantityType>
    ... and so on and so on

The file has multiple blocks starting and ending with the Save and /Save blocks and the info I'm looking for can be as far as the label, or even farther.

ElementTree.Iter seemed to be my solution as it would iterate through every Save block and find the <label> info I am looking for, but unfortunately, it doesn't accept a namespace argument.

What are my other options? I'm trying to keep my code flexible, as I foresee that the structure of the xml file could change in the future, and simple so I would rather not implement something like:

tree= ET.parse('dblank.xml')
root = tree.getroot()
for i in range(len(root)):
        Array[i]=root[i][1][0][0][0][0][0].text
2
  • You could use xpath queries to find the information you want. What have you tried so far? Commented May 1, 2020 at 23:19
  • 1
    "find the info I am looking for". What information are you looking for exactly? You can still use iter(); you just have to take the namespace of an element into account when checking a condition. Or you can use findall() with a wildcard. See stackoverflow.com/a/61154644/407651 Commented May 2, 2020 at 5:13

1 Answer 1

2

When you process XML with namespaces, you must specify the namespaces used. To this end I:

  • defined ns variable (a dictionary) with namespace shortcuts as keys and full namespaces as values (a single dictionary entry here),
  • used this variable as the second argument in findall.

Note also that the first argument of findall contains some: as the initial part of the element name.

Try the following code:

import xml.etree.ElementTree as et

tree = et.parse('Input.xml')
root = tree.getroot()
ns = {'some': 'something.something.com'}

for elem in root.findall('.//some:label', ns):
    print(elem.text)

Of course, this is only an example of how to refer to an existing element. Change it according to your needs.

Sign up to request clarification or add additional context in comments.

1 Comment

Xpath seems the way to do it. A quick test confirmed this can easily work. Thanks for your input.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.