Python / ElementTree: following-sibling error (working in xpath tester)

Question

I have a simple XML document (actually ENML for Evernote) as follows:

<!DOCTYPE en-note SYSTEM "http://xml.evernote.com/pub/enml2.dtd">
<en-note>
   <div>Here is the Evernote logo:</div>
   <div>
      <en-media type="image/png" hash="a54fe8bcd146e20a8a5742834558543c" />
   </div>
   <div>
      <br />
   </div>
   <div>
      <en-todo />
      Task 1
   </div>
   <div>making it a bit harder</div>
   <div>
      <en-todo />
      Task 2 | 2016-12-31
   </div>
   <div>
      <br />
   </div>
   <div>
      This is another to-do
      <en-todo />
      in an awkward place
   </div>
</en-note>

I'm trying to use Xpath to access the text immediately after the en-todo tags. My code is:

parsed_note = ElementTree.fromstring(note_content)
for todo in parsed_note.findall('en-note//en-todo/following-sibling::text()[1]'):
    print todo.text

I've tested this using the Xpath tester at freeformatter.com - it seems to work, but only when I remove the <!DOCTYPE en-note SYSTEM "http://xml.evernote.com/pub/enml2.dtd"> tag from the XML - I assume this is a quirk of the tester. The output is:

Text='Task 1'
Text='Task 2 | 2016-12-31'
Text='in an awkward place'

This is exactly as anticipated and desired.

When I attempt to run the code in Python, I get: SyntaxError: prefix 'following-sibling' not found in prefix map.

I suspected this may have been the same quirk as the tester and removed the file type tag, but the same error persists.

I'm using the standard parser:

import defusedxml.lxml as lxml
from lxml import etree as ElementTree

Where am I going wrong - is my xpath statement flawed, or is there some other reason for this that I'm missing?

EDIT: @Tomalek has provided a solution that works, using the Python tail function instead of the full xpath. Given the comments from @alecxe that the docs referenced are not for lxml, I will leave this open incase anyone wants to venture an idea about why the original problem exists when there should be a full xpath implementation.

alecxe · Accepted Answer · 2016-12-31 15:34:28Z

3

You should have used the xpath() method:

for todo in root.xpath('//en-note//en-todo/following-sibling::text()[1]'):
    print todo

Also note - I've added the // at the beginning and removed the .text - you've already got the text nodes - they don't have a .text attribute.

edited Dec 31, 2016 at 15:34

answered Dec 31, 2016 at 15:20

alecxe

476k127 gold badges1.1k silver badges1.2k bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Harry Over a year ago

My code already has 2x ::. Calling the xpath() method removes the error in my question, but returns no content.

alecxe Over a year ago

@HO gotcha, updated with a working code. Hope that helps.

Tomalak · Accepted Answer · 2016-12-31 15:37:33Z

1

Note: this answer is targeted at xml.etree.ElementTree. The similar, but more advanced lxml.etree module has full XPath support, but the method shown below works there as well.

Straight from the documentation, emphasis mine:

19.7.2. XPath support

This module provides limited support for XPath expressions for locating elements in a tree. The goal is to support a small subset of the abbreviated syntax; a full XPath engine is outside the scope of the module.

You can work around it by doing part of the traversal in Python.

In this case it's particularly easy because there's a convenient tail property you can use. Other cases require more work.

parsed_note = ElementTree.fromstring(note_content)
for todo in parsed_note.findall('.//en-todo'):
    print todo.tail

You will have to .strip() whitespace from the returned value.

edited Dec 31, 2016 at 15:37

answered Dec 31, 2016 at 15:19

Tomalak

339k68 gold badges547 silver badges635 bronze badges

7 Comments

Harry Over a year ago

Thanks, apologies I hadn't seen this part of the documentation. I've tried your method - it removes the error, but doesn't seem to return anything (possibly because the text is not within the tags itself - I'll work some more to try to do this in python).

alecxe Over a year ago

@HO the referred docs are for xml.etree.ElementTree while you are using lxml.etree which has the full XPath support..

Harry Over a year ago

A slight modification to your solution returns the correct result - I needed to change the xpath to './/en-todo' - the .tail function then worked as expected.

Tomalak Over a year ago

@alecxe Ah, I always end up confusing the two. It should work in both though.

Tomalak Over a year ago

@HO Added that bit to the answer. If you work with lxml.etree you should have better XPath support, as Alex says.

|

Collectives™ on Stack Overflow

Python / ElementTree: following-sibling error (working in xpath tester)

2 Answers 2

2 Comments

19.7.2. XPath support

7 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

19.7.2. XPath support

7 Comments

Your Answer

Sign up or log in

Post as a guest

Related