Python: Can iterate sub elements using elementTree

Question

I have the following code to parse an XML but it just won't let me iterate through the children:

import urllib, urllib2, re, time, os
import xml.etree.ElementTree as ET 

def wgetUrl(target):
    try:
        req = urllib2.Request(target)
        req.add_header('User-Agent', 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-GB; rv:1.9.0.3 Gecko/2008092417 Firefox/3.0.3')
        response = urllib2.urlopen(req)
        outtxt = response.read()
        response.close()
    except:
        return ''
    return outtxt

newUrl = 'http://feeds.rasset.ie/rteavgen/player/playlist?showId=10056467'

data = wgetUrl(newUrl)
tree = ET.fromstring(data)
#tree = ET.parse(data)
for elem in tree.iter('entry'):
    print elem.tag, elem.attrib

Now, If I remove 'entry' from the iter I get an output like this (Why the URL??):

{http://www.w3.org/2005/Atom}entry {}
{http://www.w3.org/2005/Atom}id {}
{http://www.w3.org/2005/Atom}published {}
{http://www.w3.org/2005/Atom}updated {}
{http://www.w3.org/2005/Atom}title {'type': 'text'}

But, If I put the iter statement like this it still does not find the children to entry:

for elem in tree.iter('{http://www.w3.org/2005/Atom}entry'):
    print elem.tag, elem.attrib

I still only get the entry element on it's own, not the children:

{http://www.w3.org/2005/Atom}entry {}

Any idea what I am doing wrong?

I have searched everywhere but can't figure this out... I am new to all this so sorry if it is something stupid.

Martijn Pieters · Accepted Answer · 2013-01-26 16:27:35Z

1

If you are parsing a Atom feed, you really want to use the feedparser library instead, which takes care of all these details for you and many more.

The {http://www.w3.org/2005/Atom} part is a namespace. You need to specify that namespace to select the entry tags:

for elem in tree.iterfind('ns:entry', {'ns': 'http://www.w3.org/2005/Atom'}):

where I used a dictionary to map the ns: prefix to the namespace, or you can use the same curly braces syntax:

for elem in tree.iterfind('{http://www.w3.org/2005/Atom}entry'):

Once you have the element, you still need to explicitly find it's children:

for elem in tree.iterfind('{http://www.w3.org/2005/Atom}entry'):
    for child in elem:
        print child

edited Jan 26, 2013 at 16:27

answered Jan 26, 2013 at 16:08

Martijn Pieters

1.1m326 gold badges4.2k silver badges3.4k bronze badges

Sign up to request clarification or add additional context in comments.

8 Comments

mcquaim Over a year ago

Even if I use for elem in tree.iterfind('{w3.org/2005/Atom}entry'): print elem.tag, elem.attrib it still doesn't iterate down to the children e.g. (<id>, <published>, <updated>, <title> etc.). Any idea why?

Martijn Pieters Over a year ago

@user1995132: Yes, you are searching for entry only, it won't find the children then. You are asking for entry tags, not id or published or updated or title tags.

mcquaim Over a year ago

Even with tree.iter('{w3.org/2005/Atom}entry') it didn't work so when I saw your example I tried iterfind but same result..

Martijn Pieters Over a year ago

@user1995132: Just tested against that feed, I find the one element with iterfind() just fine.

Martijn Pieters Over a year ago

@user1995132: Did I mention that using feedparser would be much easier already?

|

Collectives™ on Stack Overflow

Python: Can iterate sub elements using elementTree

1 Answer 1

8 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

8 Comments

Your Answer

Sign up or log in

Post as a guest

Related