1

I have a simple XML document (actually ENML for Evernote) as follows:

<!DOCTYPE en-note SYSTEM "http://xml.evernote.com/pub/enml2.dtd">
<en-note>
   <div>Here is the Evernote logo:</div>
   <div>
      <en-media type="image/png" hash="a54fe8bcd146e20a8a5742834558543c" />
   </div>
   <div>
      <br />
   </div>
   <div>
      <en-todo />
      Task 1
   </div>
   <div>making it a bit harder</div>
   <div>
      <en-todo />
      Task 2 | 2016-12-31
   </div>
   <div>
      <br />
   </div>
   <div>
      This is another to-do
      <en-todo />
      in an awkward place
   </div>
</en-note>

I'm trying to use Xpath to access the text immediately after the en-todo tags. My code is:

parsed_note = ElementTree.fromstring(note_content)
for todo in parsed_note.findall('en-note//en-todo/following-sibling::text()[1]'):
    print todo.text

I've tested this using the Xpath tester at freeformatter.com - it seems to work, but only when I remove the <!DOCTYPE en-note SYSTEM "http://xml.evernote.com/pub/enml2.dtd"> tag from the XML - I assume this is a quirk of the tester. The output is:

Text='Task 1'
Text='Task 2 | 2016-12-31'
Text='in an awkward place'

This is exactly as anticipated and desired.

When I attempt to run the code in Python, I get: SyntaxError: prefix 'following-sibling' not found in prefix map.

I suspected this may have been the same quirk as the tester and removed the file type tag, but the same error persists.

I'm using the standard parser:

import defusedxml.lxml as lxml
from lxml import etree as ElementTree

Where am I going wrong - is my xpath statement flawed, or is there some other reason for this that I'm missing?

EDIT: @Tomalek has provided a solution that works, using the Python tail function instead of the full xpath. Given the comments from @alecxe that the docs referenced are not for lxml, I will leave this open incase anyone wants to venture an idea about why the original problem exists when there should be a full xpath implementation.

2 Answers 2

3

You should have used the xpath() method:

for todo in root.xpath('//en-note//en-todo/following-sibling::text()[1]'):
    print todo

Also note - I've added the // at the beginning and removed the .text - you've already got the text nodes - they don't have a .text attribute.

Sign up to request clarification or add additional context in comments.

2 Comments

My code already has 2x ::. Calling the xpath() method removes the error in my question, but returns no content.
@HO gotcha, updated with a working code. Hope that helps.
1

Note: this answer is targeted at xml.etree.ElementTree. The similar, but more advanced lxml.etree module has full XPath support, but the method shown below works there as well.


Straight from the documentation, emphasis mine:

19.7.2. XPath support

This module provides limited support for XPath expressions for locating elements in a tree. The goal is to support a small subset of the abbreviated syntax; a full XPath engine is outside the scope of the module.

You can work around it by doing part of the traversal in Python.

In this case it's particularly easy because there's a convenient tail property you can use. Other cases require more work.

parsed_note = ElementTree.fromstring(note_content)
for todo in parsed_note.findall('.//en-todo'):
    print todo.tail

You will have to .strip() whitespace from the returned value.

7 Comments

Thanks, apologies I hadn't seen this part of the documentation. I've tried your method - it removes the error, but doesn't seem to return anything (possibly because the text is not within the tags itself - I'll work some more to try to do this in python).
@HO the referred docs are for xml.etree.ElementTree while you are using lxml.etree which has the full XPath support..
A slight modification to your solution returns the correct result - I needed to change the xpath to './/en-todo' - the .tail function then worked as expected.
@alecxe Ah, I always end up confusing the two. It should work in both though.
@HO Added that bit to the answer. If you work with lxml.etree you should have better XPath support, as Alex says.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.