How can I access CDATA as a node using XPath in Java?

Question

Using this online XPath tester on the following XML

<a>foo <![CDATA[ MyCData]]>  baz</a>

with the XPath expression /a/text(), I get back all the text

foo <![CDATA[ MyCData]]>  baz

(This is structured as three nodes, as we can see using /a/text()[2] , which returns baz.)

However, with javax.xml.xpath.XPath, the CData and the last text node are not returned at all. I get a single node with foo, and the remainder of the text <![CDATA[ MyCData]]> baz is just not available. Regardless of how XPath treats the XML structure, it is a bug if we cannot access nodes at all.

However, if I set isCoalescing(true) on the DocumentBuilderFactory, it concatenates all the text and CData nodes into one. I might end up using that, but it converts CData to escaped text in the output, which looks ugly, even if it is allowed by the standard. Also, I would prefer to be able to address the CData separately as some sort of node, whether "just" a text node, or else some special type of CData node.

By the way, if the CData is the only contents of its parent element, with no spaces or other text in front, an ordinary text-content XPath retrieves it successfully, even with isCoalescing at its default (false). So, we see that the Java XPath is always returning the first, and only the first, text node.

When I examine the full DOM tree of my DOM Document, with isCoalescing at its default, I find that the CData section is represented as its own node of type cdata-section, which is great, but how can I access this node in XPath?

Thanks, but that talks about XML inside CData. I just want the CData! In other XPath engines CData is simply a text node, but not in Java, as described. — Joshua Fox
– Joshua Fox, Commented Aug 28, 2012 at 6:09

Michael Kay · Accepted Answer · 2012-08-28 08:17:13Z

2

The online XPath tester is getting it wrong, I'm afraid. According to the XPath data model, the <a> element has a single text node child whose string value is "foo MyCDATA baz"; there is no second text node, so a request for the second text node should return nothing.

The XPath data model takes the view that CDATA is merely a convenient way of inputting data to avoid having to escape special characters; the presence of the CDATA does not affect the meaning or information content of the XML, so it is not made available to the application.

answered Aug 28, 2012 at 8:17

Michael Kay

165k11 gold badges97 silver badges173 bronze badges

Sign up to request clarification or add additional context in comments.

7 Comments

Joshua Fox Over a year ago

OK, that would be great if the Java XPath returned a single node foo MyCData baz. But in fact, it returns a single node foo and no other nodes.

Joshua Fox Over a year ago

Apparently setCoalescing(true) gives the result you described. But what is the Java XPath engine doing in the case when coalescing is false? It seems to be not producing an alternate structure, but rather just "giving up" on all text but the first node.

Michael Kay Over a year ago

The Saxon XPath engine gives you a single text node containing all the data, whether or not the DOM is coalesced. Give it a try. (Even better, don't use DOM: switch to a better tree model, such as JDOM or XOM).

Joshua Fox Over a year ago

"...whether or not the DOM is coalesced": I would rather not convert CData to escaped text in the output, as it looks ugly, even if it is standard. Also, I would prefer to be able to address the CData separately as some sort of node, whether "just" a text node as in the XPath spec, or else some special type of node.

Michael Kay Over a year ago

I'm telling you what the XPath spec says. The fact that you would prefer it to say something else isn't really relevant.

|

Collectives™ on Stack Overflow

How can I access CDATA as a node using XPath in Java?

1 Answer 1

7 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

7 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related