0

I'm trying to read through an xml feed I'm getting, but I can't access the specific elements. I'm using python, and the python documentation is really unclear about what I should use.

Here is the feed:

<title>More eagle</title>
<summary>http://www.181.fm/winamp.plsstation=181eagle&amp;style=&amp;description=The%20Eagle%20(Classic ...</summary> 
<link rel="alternate" href="http://mail.google.com/mail [email protected]&amp;message_id=12995390f36c310b&amp;view=conv&amp;extsrc=atom" type="text/html" />
<modified>2010-07-02T22:13:51Z</modified>
<issued>2010-07-02T22:13:51Z</issued>
<id>tag:gmail.google.com,2004:1340194246143783179 </id>

And here is my current function:

def parse_xml(feed):
    feedxml = minidom.parseString(feed)
    name = feedxml.getElementsByTagName('name')
    subject = feedxml.getElementsByTagName('title')
    contents = feedxml.getElementsByTagName('summary')
    return name + "\n" + subject + "\n" + contents
1
  • To clarify, I need to get the text between the element tags. Right now I'm getting the following: <xml.dom.minidom.Document instance at 0x14b7c10> [<DOM Element: name at 0x14dd210>] [<DOM Element: title at 0x14d8760>, <DOM Element: title at 0x14d8c38>] Commented Jul 6, 2010 at 17:25

2 Answers 2

1
getElementsByTagName()

returns a list of elements. So if you want the first (or only) one, you need to use getElementsByTagName('name')[0].

But this is an element object, not the text enclosed by it (which I presume you're interested in).

So you probably need to do something like this:

nametag = feedxml.getElementsByTagName('name')[0]
nametag.normalize()
name = nametag.firstChild.data
Sign up to request clarification or add additional context in comments.

Comments

1

To get the text of an element you have to do something like this:

  def getElementText(node, tagName):
    for node in node.getElementsByTagName(tagName):
      result = ""  # handle empty elements
      for tnode in node.childNodes:
        if tnode.nodeType == tnode.TEXT_NODE:
          result = tnode.data
    return result

  def parse_xml(feed):
    feedxml = minidom.parseString(feed)
    name = getElementText(feedxml,'name')
    subject = getElementText(feedxml,'title')
    contents = getElementText(feedxml,'summary')
    return name + "\n" + subject + "\n" + contents

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.