3

Using this online XPath tester on the following XML

<a>foo <![CDATA[ MyCData]]>  baz</a>    

with the XPath expression /a/text(), I get back all the text

foo <![CDATA[ MyCData]]>  baz 

(This is structured as three nodes, as we can see using /a/text()[2] , which returns baz.)

However, with javax.xml.xpath.XPath, the CData and the last text node are not returned at all. I get a single node with foo, and the remainder of the text <![CDATA[ MyCData]]> baz is just not available. Regardless of how XPath treats the XML structure, it is a bug if we cannot access nodes at all.

However, if I set isCoalescing(true) on the DocumentBuilderFactory, it concatenates all the text and CData nodes into one. I might end up using that, but it converts CData to escaped text in the output, which looks ugly, even if it is allowed by the standard. Also, I would prefer to be able to address the CData separately as some sort of node, whether "just" a text node, or else some special type of CData node.

By the way, if the CData is the only contents of its parent element, with no spaces or other text in front, an ordinary text-content XPath retrieves it successfully, even with isCoalescing at its default (false). So, we see that the Java XPath is always returning the first, and only the first, text node.

When I examine the full DOM tree of my DOM Document, with isCoalescing at its default, I find that the CData section is represented as its own node of type cdata-section, which is great, but how can I access this node in XPath?

2
  • 1
    Maybe this helps: stackoverflow.com/questions/4184858/… Commented Aug 27, 2012 at 21:45
  • Thanks, but that talks about XML inside CData. I just want the CData! In other XPath engines CData is simply a text node, but not in Java, as described. Commented Aug 28, 2012 at 6:09

1 Answer 1

2

The online XPath tester is getting it wrong, I'm afraid. According to the XPath data model, the <a> element has a single text node child whose string value is "foo MyCDATA baz"; there is no second text node, so a request for the second text node should return nothing.

The XPath data model takes the view that CDATA is merely a convenient way of inputting data to avoid having to escape special characters; the presence of the CDATA does not affect the meaning or information content of the XML, so it is not made available to the application.

Sign up to request clarification or add additional context in comments.

7 Comments

OK, that would be great if the Java XPath returned a single node foo MyCData baz. But in fact, it returns a single node foo and no other nodes.
Apparently setCoalescing(true) gives the result you described. But what is the Java XPath engine doing in the case when coalescing is false? It seems to be not producing an alternate structure, but rather just "giving up" on all text but the first node.
The Saxon XPath engine gives you a single text node containing all the data, whether or not the DOM is coalesced. Give it a try. (Even better, don't use DOM: switch to a better tree model, such as JDOM or XOM).
"...whether or not the DOM is coalesced": I would rather not convert CData to escaped text in the output, as it looks ugly, even if it is standard. Also, I would prefer to be able to address the CData separately as some sort of node, whether "just" a text node as in the XPath spec, or else some special type of node.
I'm telling you what the XPath spec says. The fact that you would prefer it to say something else isn't really relevant.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.