How to retrieve a specific element with from xml with python

Question

I'm trying to read through an xml feed I'm getting, but I can't access the specific elements. I'm using python, and the python documentation is really unclear about what I should use.

Here is the feed:

<title>More eagle</title>
<summary>http://www.181.fm/winamp.plsstation=181eagle&amp;style=&amp;description=The%20Eagle%20(Classic ...</summary> 
<link rel="alternate" href="http://mail.google.com/mail [email protected]&amp;message_id=12995390f36c310b&amp;view=conv&amp;extsrc=atom" type="text/html" />
<modified>2010-07-02T22:13:51Z</modified>
<issued>2010-07-02T22:13:51Z</issued>
<id>tag:gmail.google.com,2004:1340194246143783179 </id>

And here is my current function:

def parse_xml(feed):
    feedxml = minidom.parseString(feed)
    name = feedxml.getElementsByTagName('name')
    subject = feedxml.getElementsByTagName('title')
    contents = feedxml.getElementsByTagName('summary')
    return name + "\n" + subject + "\n" + contents

To clarify, I need to get the text between the element tags. Right now I'm getting the following: <xml.dom.minidom.Document instance at 0x14b7c10> [<DOM Element: name at 0x14dd210>] [<DOM Element: title at 0x14d8760>, <DOM Element: title at 0x14d8c38>] — SachaK
– SachaK, Commented Jul 6, 2010 at 17:25

Tim Pietzcker · Accepted Answer · 2010-07-06 17:12:35Z

1

getElementsByTagName()

returns a list of elements. So if you want the first (or only) one, you need to use getElementsByTagName('name')[0].

But this is an element object, not the text enclosed by it (which I presume you're interested in).

So you probably need to do something like this:

nametag = feedxml.getElementsByTagName('name')[0]
nametag.normalize()
name = nametag.firstChild.data

answered Jul 6, 2010 at 17:12

Tim Pietzcker

337k59 gold badges520 silver badges572 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

m0rganic · Accepted Answer · 2011-02-09 02:27:38Z

1

To get the text of an element you have to do something like this:

  def getElementText(node, tagName):
    for node in node.getElementsByTagName(tagName):
      result = ""  # handle empty elements
      for tnode in node.childNodes:
        if tnode.nodeType == tnode.TEXT_NODE:
          result = tnode.data
    return result

  def parse_xml(feed):
    feedxml = minidom.parseString(feed)
    name = getElementText(feedxml,'name')
    subject = getElementText(feedxml,'title')
    contents = getElementText(feedxml,'summary')
    return name + "\n" + subject + "\n" + contents

answered Feb 9, 2011 at 2:27

m0rganic

613 bronze badges

Collectives™ on Stack Overflow

How to retrieve a specific element with from xml with python

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related