Parse XML file with namespace with Python

Question

I have a complex xml I'm trying to extract data from.

<?xml version="1.0" ?>
<root xmlns="something.something.com">
    <Save>
        <AdditionalInfo>
            <Name></Name>
            <Time></Time>
            <UtilityVersion></UtilityVersion>
            <XMLVersion></XMLVersion>
            <PluginName></PluginName>
            <ClassName></ClassName>
        </AdditionalInfo>
        <Data>
            <session>
                <xyDataObjects>
                    <xyData Key="'info'" ObjectType="moreinfo" Type="evenmoreinfo">
                        <axis1QuantityType ObjectType="guesswhat" Type="info!">
                            <label></label>
                            <type></type>
                        </axis1QuantityType>
    ... and so on and so on

The file has multiple blocks starting and ending with the Save and /Save blocks and the info I'm looking for can be as far as the label, or even farther.

ElementTree.Iter seemed to be my solution as it would iterate through every Save block and find the <label> info I am looking for, but unfortunately, it doesn't accept a namespace argument.

What are my other options? I'm trying to keep my code flexible, as I foresee that the structure of the xml file could change in the future, and simple so I would rather not implement something like:

tree= ET.parse('dblank.xml')
root = tree.getroot()
for i in range(len(root)):
        Array[i]=root[i][1][0][0][0][0][0].text

You could use xpath queries to find the information you want. What have you tried so far? — larsks
– larsks, Commented May 1, 2020 at 23:19
"find the info I am looking for". What information are you looking for exactly? You can still use iter(); you just have to take the namespace of an element into account when checking a condition. Or you can use findall() with a wildcard. See stackoverflow.com/a/61154644/407651 — mzjn
– mzjn, Commented May 2, 2020 at 5:13

Valdi_Bo · Accepted Answer · 2020-05-03 13:38:59Z

2

When you process XML with namespaces, you must specify the namespaces used. To this end I:

defined ns variable (a dictionary) with namespace shortcuts as keys and full namespaces as values (a single dictionary entry here),
used this variable as the second argument in findall.

Note also that the first argument of findall contains some: as the initial part of the element name.

Try the following code:

import xml.etree.ElementTree as et

tree = et.parse('Input.xml')
root = tree.getroot()
ns = {'some': 'something.something.com'}

for elem in root.findall('.//some:label', ns):
    print(elem.text)

Of course, this is only an example of how to refer to an existing element. Change it according to your needs.

answered May 3, 2020 at 13:38

Valdi_Bo

31.1k4 gold badges29 silver badges45 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

HotFuzz Over a year ago

Xpath seems the way to do it. A quick test confirmed this can easily work. Thanks for your input.

Collectives™ on Stack Overflow

Parse XML file with namespace with Python

1 Answer 1

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related