2

I'm trying to parse this XML. It's a YouTube feed. I'm working based on code in the tutorial. I want to get all the entry nodes that are nested under the feed.

from lxml import etree
root = etree.fromstring(text)
entries = root.xpath("/feed/entry")
print entries

For some reason entries is an empty list. Why?

7
  • what does "text" look like ? Commented Aug 21, 2013 at 11:16
  • It's the XML from the link, read from a file. Commented Aug 21, 2013 at 11:17
  • The XML is a mess, can't you indent it properly? Commented Aug 21, 2013 at 11:20
  • @NilsWerner I updated the link to point to pretty-printed XML. Commented Aug 22, 2013 at 5:49
  • Can you mark my answer as being correct? Commented Aug 31, 2013 at 20:00

2 Answers 2

4

feed and all its children are actually in the http://www.w3.org/2005/Atom namespace. You need to tell your xpath that:

entries = root.xpath("/atom:feed/atom:entry", 
                     namespaces={'atom': 'http://www.w3.org/2005/Atom'})

or, if you want to change the default empty namespace:

entries = root.xpath("/feed/entry", 
                     namespaces={None: 'http://www.w3.org/2005/Atom'})

or, if you don't want to use shorthandles at all:

entries = root.xpath("/{http://www.w3.org/2005/Atom}feed/{http://www.w3.org/2005/Atom}entry")

To my knowledge the "local namespace" is implicitly assumed for the node you're working with so that operations on children in the same namespace do not require you to set it again. So you should be able to do something along the lines of:

feed = root.find("/atom:feed",
                     namespaces={'atom': 'http://www.w3.org/2005/Atom'})

title = feed.xpath("title")
entries = feed.xpath("entries")
# etc...
Sign up to request clarification or add additional context in comments.

6 Comments

i think you could do it only if your are the author of this XML file to drop this namespace
You should not "drop the namespace" as there is a reason why Atom feeds are using it. I've added a few more examples that could make your life easier.
Some XPATH versions allow specifying "*" for any namespace if I recall correctly?
You can use *[local-name()='feed'] to match an element feed of any namespace. That is considered to be bad practice though.
@misha Is there any way to avoid specifying the prefix? Yes, use XPath 2.0. But that's not easy from Python.
|
1

It's because of the namespace in the XML. Here is an explanation: http://www.edankert.com/defaultnamespaces.html#Conclusion.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.