3

I have Googled my pants off to no avail. What I am trying to do is very simple: I'd like to access the UniqueID value in the following XML contained in a string using ElementTree.

from xml.etree.ElementTree import fromstring

xml_string = """<ListObjectsResponse xmlns='http://www.example.com/dir/'>
        <Item>
                <UniqueID>abcdefghijklmnopqrstuvwxyz0123456789</UniqueID>
        </Item>
</ListObjectsResponse>"""

NS = "http://www.example.com/dir/"

tree = fromstring(xml_string)

I know that I should use the fromstring method to parse the XML string, but I can't seem to identify how to access the UniqueID. I'm not certain how to use the find, findall, or findtext methods with respect to the namespace.

Any help is totally appreciated.

3
  • You've googled your pants off? A search for "ElementTree find namespace" turned up this as the second hit: "ElementTree: Working with Namespaces and Qualified Names" Commented Dec 14, 2011 at 20:37
  • 1
    I read that entire thing. Still didn't quite understand it to help with my dilemma I'm afraid. Commented Dec 14, 2011 at 20:55
  • 1
    @Raj I also rarely understand thoroughly the explanations on the effobot site Commented Dec 15, 2011 at 11:21

2 Answers 2

5

The following should get you going:

>>> tree.findall('*/*')
[<Element '{http://www.example.com/dir/}UniqueID' at 0x10899e450>]

This lists all the elements that are two levels below the root of your tree (the UniqueID element, in your case). You can, alternatively, find only the first element at this level, with tree.find(). You can then directly get the text contents of the UniqueID element:

>>> unique_id_elmt = tree.find('*/*')  # First (and only) element two levels below the root
>>> unique_id_elmt
<Element '{http://www.example.com/dir/}UniqueID' at 0x105ec9450>
>>> unique_id_elmt.text  # Text contained in UniqueID
'abcdefghijklmnopqrstuvwxyz0123456789'

Alternatively, you can directly find some precise element by specifying its full path:

>>> tree.find('{{{0}}}Item/{{{0}}}UniqueID'.format(NS))  # Tags are prefixed with NS
<Element '{http://www.example.com/dir/}UniqueID' at 0x10899ead0>

As Tomalak indicated, Fredrik Lundh's site might contain useful information; you want to check how prefixes can be handled: there might in fact be a simpler way to handle them than by making explicit the NS path in the method above.

Sign up to request clarification or add additional context in comments.

2 Comments

Thanks for the example. In this case how do I print the actual UniqueID though?
@Raj: I added code that shows how to get the contents of the UniqueID element.
4

I know there will be some yells of horror and downvotes for my answer as retaliation, because I use module re to analyse an XML string, but note that:

  • in the majority of cases , the following solution won't cause any problem

  • I wish the downvoters will give examples of cases causing problems with my solution

  • I don't parse the string, taking the word 'parse' in the sense of 'obtaining a tree before analysing the tree to find what is searched'; I analyse it: I find directly the whished portion of text where it is

I don't pretend that an XML string must always been analysed with the help of re. It is probably in the majority of cases that an XML string must be parsed with a dedicated parser. I only say that in simple cases like this one , in which a simple and fast analyze is possible, it is easier to use the regex tool, which is, by the way, faster.

import re

xml_string = """<ListObjectsResponse xmlns='http://www.example.com/dir/'>
        <Item>
                <UniqueID>abcdefghijklmnopqrstuvwxyz0123456789</UniqueID>
        </Item>
</ListObjectsResponse>"""

print xml_string
print '\n=================================\n'

print re.search('<UniqueID>(.+?)</UniqueID>', xml_string, re.DOTALL).group(1)

result

<ListObjectsResponse xmlns='http://www.example.com/dir/'>
        <Item>
                <UniqueID>abcdefghijklmnopqrstuvwxyz0123456789</UniqueID>
        </Item>
</ListObjectsResponse>

=================================

abcdefghijklmnopqrstuvwxyz0123456789

2 Comments

+1: This approach is indeed very good in some situations. Is also has the advantage of being very legible (the format of the searched text is very obvious).
import reeeeeeeeee

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.