2

I am a first time XPath user and need to be able to get the text values of these different elements.. for instance time, title, etc.. I am using the libxml2 module in Python and so far have not had much luck getting just the values of the text I need. The code below here only returns the element tags.. i need the values.. any help would be GREATLY appreciated!

I'm using this code:

doc = libxml2.parseDoc(xmlOutput)
result = doc.xpathEval('//*')

With the following document:

<?xml version="1.0" encoding="UTF-8" ?>
<!DOCTYPE SCAN_LIST_OUTPUT SYSTEM "https://qualysapi.qualys.com/api/2.0/fo/sca/scan_list_output.dtd">
<SCAN_LIST_OUTPUT>
<RESPONSE>
<DATETIME>2012-01-22T01:21:53Z</DATETIME>
<SCAN_LIST>
  <SCAN>
    <REF>scan/2343423</REF>
    <TYPE>Scheduled</TYPE>
    <TITLE><![CDATA[customer 1 5/20/2012]]></TITLE>
    <USER_LOGIN>user1</USER_LOGIN>
    <LAUNCH_DATETIME>2012-02-21T04:11:05Z</LAUNCH_DATETIME>
    <STATUS>
      <STATE>Finished</STATE>
    </STATUS>
    <TARGET><![CDATA[13.3.3.2, 13.8.8.10, 13.10.12.60, 13.10.12.11...]]></TARGET>
  </SCAN>
</SCAN_LIST>
</RESPONSE>
</SCAN_LIST_OUTPUT>
3
  • 7
    I'd strongly, strongly suggest lxml.etree, which uses the C libxml2 library under-the-hood but provides a far friendlier API. Commented May 22, 2012 at 1:54
  • 1
    You should really post that as an answer instead :-) Commented May 22, 2012 at 2:00
  • The question isn't really about XPath -- the XPath call you're making works perfectly; your question is about how to deal with the values it returns, which aren't any different from the element objects you'd get iterating directly (ie. not using XPath). Updated summary and tagging appropriately. Commented May 22, 2012 at 12:21

2 Answers 2

5

You can call getContent() on each returned xmlNode object to retrieve the associated text. Note that this is recursive -- to non-recursively access text content in libxml2, you'll want to retrieve the associated text node under the element, and call .getContent() on that.

That said, this would be easier if you used lxml.etree (a higher-level Python API, still backing into the C libxml2 library) instead of the Python libxml2; in that case, it's simply element.text to access the associated content as a string.

Sign up to request clarification or add additional context in comments.

2 Comments

.getContent() did the trick!!! THANK YOU SO MUCH :).. any idea where a good references or documentation for the rest of the functions would be? the api docs I was going through were horrible and I never came across .getContent().. Thanks again!
@user1377384 I found getContent() by running help(result[0]) at the REPL and reading its output.
1

Have a look at Mark Pilgrim's Dive Into Python 3, Chapter 12. XML

The chapter starts with short course to XML (general talk but with the Atom Syndication Feed example), then it continues with the standard xml.etree.ElementTree and continues with third party lxml that implements more with the same interface (full XPATH 1.0, based on libxml2).

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.