Reading text from XML nodes using Python's libxml2

Question

I am a first time XPath user and need to be able to get the text values of these different elements.. for instance time, title, etc.. I am using the libxml2 module in Python and so far have not had much luck getting just the values of the text I need. The code below here only returns the element tags.. i need the values.. any help would be GREATLY appreciated!

I'm using this code:

doc = libxml2.parseDoc(xmlOutput)
result = doc.xpathEval('//*')

With the following document:

<?xml version="1.0" encoding="UTF-8" ?>
<!DOCTYPE SCAN_LIST_OUTPUT SYSTEM "https://qualysapi.qualys.com/api/2.0/fo/sca/scan_list_output.dtd">
<SCAN_LIST_OUTPUT>
<RESPONSE>
<DATETIME>2012-01-22T01:21:53Z</DATETIME>
<SCAN_LIST>
  <SCAN>
    <REF>scan/2343423</REF>
    <TYPE>Scheduled</TYPE>
    <TITLE><![CDATA[customer 1 5/20/2012]]></TITLE>
    <USER_LOGIN>user1</USER_LOGIN>
    <LAUNCH_DATETIME>2012-02-21T04:11:05Z</LAUNCH_DATETIME>
    <STATUS>
      <STATE>Finished</STATE>
    </STATUS>
    <TARGET><![CDATA[13.3.3.2, 13.8.8.10, 13.10.12.60, 13.10.12.11...]]></TARGET>
  </SCAN>
</SCAN_LIST>
</RESPONSE>
</SCAN_LIST_OUTPUT>

I'd strongly, strongly suggest lxml.etree, which uses the C libxml2 library under-the-hood but provides a far friendlier API. — Charles Duffy
– Charles Duffy, Commented May 22, 2012 at 1:54
The question isn't really about XPath -- the XPath call you're making works perfectly; your question is about how to deal with the values it returns, which aren't any different from the element objects you'd get iterating directly (ie. not using XPath). Updated summary and tagging appropriately. — Charles Duffy
– Charles Duffy, Commented May 22, 2012 at 12:21

Charles Duffy · Accepted Answer · 2012-05-22 01:59:22Z

5

You can call getContent() on each returned xmlNode object to retrieve the associated text. Note that this is recursive -- to non-recursively access text content in libxml2, you'll want to retrieve the associated text node under the element, and call .getContent() on that.

That said, this would be easier if you used lxml.etree (a higher-level Python API, still backing into the C libxml2 library) instead of the Python libxml2; in that case, it's simply element.text to access the associated content as a string.

answered May 22, 2012 at 1:59

Charles Duffy

299k43 gold badges441 silver badges497 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

user1377384 Over a year ago

.getContent() did the trick!!! THANK YOU SO MUCH :).. any idea where a good references or documentation for the rest of the functions would be? the api docs I was going through were horrible and I never came across .getContent().. Thanks again!

Charles Duffy Over a year ago

@user1377384 I found getContent() by running help(result[0]) at the REPL and reading its output.

pepr · Accepted Answer · 2013-11-03 22:22:59Z

1

Have a look at Mark Pilgrim's Dive Into Python 3, Chapter 12. XML

The chapter starts with short course to XML (general talk but with the Atom Syndication Feed example), then it continues with the standard xml.etree.ElementTree and continues with third party lxml that implements more with the same interface (full XPATH 1.0, based on libxml2).

edited Nov 3, 2013 at 22:22

answered May 23, 2012 at 7:59

pepr

21.1k15 gold badges83 silver badges148 bronze badges

Collectives™ on Stack Overflow

Reading text from XML nodes using Python's libxml2

2 Answers 2

2 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related