Using ElementTree to parse an XML string with a namespace

Question

I have Googled my pants off to no avail. What I am trying to do is very simple: I'd like to access the UniqueID value in the following XML contained in a string using ElementTree.

from xml.etree.ElementTree import fromstring

xml_string = """<ListObjectsResponse xmlns='http://www.example.com/dir/'>
        <Item>
                <UniqueID>abcdefghijklmnopqrstuvwxyz0123456789</UniqueID>
        </Item>
</ListObjectsResponse>"""

NS = "http://www.example.com/dir/"

tree = fromstring(xml_string)

I know that I should use the fromstring method to parse the XML string, but I can't seem to identify how to access the UniqueID. I'm not certain how to use the find, findall, or findtext methods with respect to the namespace.

Any help is totally appreciated.

You've googled your pants off? A search for "ElementTree find namespace" turned up this as the second hit: "ElementTree: Working with Namespaces and Qualified Names" — Tomalak
– Tomalak, Commented Dec 14, 2011 at 20:37
I read that entire thing. Still didn't quite understand it to help with my dilemma I'm afraid. — Raj
– Raj, Commented Dec 14, 2011 at 20:55
@Raj I also rarely understand thoroughly the explanations on the effobot site — eyquem
– eyquem, Commented Dec 15, 2011 at 11:21

Eric O. Lebigot · Accepted Answer · 2011-12-15 13:30:42Z

5

The following should get you going:

>>> tree.findall('*/*')
[<Element '{http://www.example.com/dir/}UniqueID' at 0x10899e450>]

This lists all the elements that are two levels below the root of your tree (the UniqueID element, in your case). You can, alternatively, find only the first element at this level, with tree.find(). You can then directly get the text contents of the UniqueID element:

>>> unique_id_elmt = tree.find('*/*')  # First (and only) element two levels below the root
>>> unique_id_elmt
<Element '{http://www.example.com/dir/}UniqueID' at 0x105ec9450>
>>> unique_id_elmt.text  # Text contained in UniqueID
'abcdefghijklmnopqrstuvwxyz0123456789'

Alternatively, you can directly find some precise element by specifying its full path:

>>> tree.find('{{{0}}}Item/{{{0}}}UniqueID'.format(NS))  # Tags are prefixed with NS
<Element '{http://www.example.com/dir/}UniqueID' at 0x10899ead0>

As Tomalak indicated, Fredrik Lundh's site might contain useful information; you want to check how prefixes can be handled: there might in fact be a simpler way to handle them than by making explicit the NS path in the method above.

edited Dec 15, 2011 at 13:30

answered Dec 14, 2011 at 20:44

Eric O. Lebigot

95.1k49 gold badges223 silver badges263 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Raj Over a year ago

Thanks for the example. In this case how do I print the actual UniqueID though?

Eric O. Lebigot Over a year ago

@Raj: I added code that shows how to get the contents of the UniqueID element.

Eric O. Lebigot · Accepted Answer · 2013-05-08 03:07:23Z

4

I know there will be some yells of horror and downvotes for my answer as retaliation, because I use module re to analyse an XML string, but note that:

in the majority of cases , the following solution won't cause any problem
I wish the downvoters will give examples of cases causing problems with my solution
I don't parse the string, taking the word 'parse' in the sense of 'obtaining a tree before analysing the tree to find what is searched'; I analyse it: I find directly the whished portion of text where it is

I don't pretend that an XML string must always been analysed with the help of re. It is probably in the majority of cases that an XML string must be parsed with a dedicated parser. I only say that in simple cases like this one , in which a simple and fast analyze is possible, it is easier to use the regex tool, which is, by the way, faster.

import re

xml_string = """<ListObjectsResponse xmlns='http://www.example.com/dir/'>
        <Item>
                <UniqueID>abcdefghijklmnopqrstuvwxyz0123456789</UniqueID>
        </Item>
</ListObjectsResponse>"""

print xml_string
print '\n=================================\n'

print re.search('<UniqueID>(.+?)</UniqueID>', xml_string, re.DOTALL).group(1)

result

<ListObjectsResponse xmlns='http://www.example.com/dir/'>
        <Item>
                <UniqueID>abcdefghijklmnopqrstuvwxyz0123456789</UniqueID>
        </Item>
</ListObjectsResponse>

=================================

abcdefghijklmnopqrstuvwxyz0123456789

edited May 8, 2013 at 3:07

Eric O. Lebigot

95.1k49 gold badges223 silver badges263 bronze badges

answered Dec 15, 2011 at 10:53

eyquem

27.7k7 gold badges43 silver badges46 bronze badges

2 Comments

Eric O. Lebigot Over a year ago

+1: This approach is indeed very good in some situations. Is also has the advantage of being very legible (the format of the searched text is very obvious).

admirabilis Over a year ago

import reeeeeeeeee

Collectives™ on Stack Overflow

Using ElementTree to parse an XML string with a namespace

2 Answers 2

2 Comments

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related