I have a simple XML document (actually ENML for Evernote) as follows:
<!DOCTYPE en-note SYSTEM "http://xml.evernote.com/pub/enml2.dtd">
<en-note>
<div>Here is the Evernote logo:</div>
<div>
<en-media type="image/png" hash="a54fe8bcd146e20a8a5742834558543c" />
</div>
<div>
<br />
</div>
<div>
<en-todo />
Task 1
</div>
<div>making it a bit harder</div>
<div>
<en-todo />
Task 2 | 2016-12-31
</div>
<div>
<br />
</div>
<div>
This is another to-do
<en-todo />
in an awkward place
</div>
</en-note>
I'm trying to use Xpath to access the text immediately after the en-todo tags. My code is:
parsed_note = ElementTree.fromstring(note_content)
for todo in parsed_note.findall('en-note//en-todo/following-sibling::text()[1]'):
print todo.text
I've tested this using the Xpath tester at freeformatter.com - it seems to work, but only when I remove the <!DOCTYPE en-note SYSTEM "http://xml.evernote.com/pub/enml2.dtd"> tag from the XML - I assume this is a quirk of the tester. The output is:
Text='Task 1'
Text='Task 2 | 2016-12-31'
Text='in an awkward place'
This is exactly as anticipated and desired.
When I attempt to run the code in Python, I get: SyntaxError: prefix 'following-sibling' not found in prefix map.
I suspected this may have been the same quirk as the tester and removed the file type tag, but the same error persists.
I'm using the standard parser:
import defusedxml.lxml as lxml
from lxml import etree as ElementTree
Where am I going wrong - is my xpath statement flawed, or is there some other reason for this that I'm missing?
EDIT: @Tomalek has provided a solution that works, using the Python tail function instead of the full xpath. Given the comments from @alecxe that the docs referenced are not for lxml, I will leave this open incase anyone wants to venture an idea about why the original problem exists when there should be a full xpath implementation.