1

I want to load a very big XML file into a DOM tree (using JAXP), do some modifications and run XPath queries on the resulting DOM.

We use our own DOM implementation which is implemented using lazy loading (i.e. in the beginning only the first two levels of the DOM are loaded from the file, if getChildNodes()/etc. is called we go back to the file and load more levels). This is very slow, however, we are able to load much bigger files, especially if we only use parts of the file.

My question: XPath is using a different view on the XML (afaik). I'm curious if the default Oracle JDK implementation is now converting the whole DOM document into some internal XPath document (which would be really bad, as it would eagerly load the whole document) or if the XPath implementation is able to work directly on our DOM tree (i.e. no further loading if the XPath can be evaluated within the already loaded elements).

4
  • 1
    what do you mean by very big? And wouldn't your question be answered if you looked at the memory consumption during runtime? Commented Feb 21, 2013 at 0:15
  • Why do you believe that XPath is using something other than DOM? FWIW, there's at least one bug that I've seen that indicates it's using the DOM in its regular form (the bug involves traversing the entire DOM with searches based on a context deep within the tree). Commented Feb 21, 2013 at 20:05
  • The specs do not mention DOM as an underlying model, and the Apache Xalan-J implementation (which is afaik used in the Oracle JDK) seems to use some kind of DTM ( xml.apache.org/xalan-j/dtm.html ). However, I couldn't find out yet if the DOM is completely transformed in a DTM or only the specifc parts required for the evaluation of the XPathExpression or if the DTM is only an adapter to the DOM. Commented Feb 21, 2013 at 20:33
  • @parsifal: See my answer below, unfortunately the most-used XPath implementation is converting the whole DOM (eagerly) into an internal format before evaluating the XPath - even if the XPath is just /root, the whole document is converted first... :( Commented Mar 2, 2013 at 19:54

2 Answers 2

2

This can be tested with few lines of code, just feed your DOM to XPath evaluator and put few breakpoints/debug prints into your DOM methods. If they are get called for elements that should not been retrieved then it builds its own tree. Example: try to query only document's first child and see what it will actually try to retrieve.

And also may be you can check radically different approach if your files are so big? Called SAX.

Sign up to request clarification or add additional context in comments.

1 Comment

I've been lying a little bit: our lazy DOM implementation is not yet implemented and we're currently doing research if it is even possible to use a lazy DOM tree aftwards (e.g. by XPath, amongst other access methods). So I was curious if somebody does knows about the default JDK XPath implementation. XPath on DOM trees is going to be a regular case I guess.
0

Our DOM implementation is now finished, therefore I could test this now:

Unfortunately the official JDK implementation as well as the current Xalan-J implementation are converting the whole DOM tree to an internal data structure before evaluating the path.

This should be really bad even if you're not having a lazy DOM implementation, this is really bad in any case...

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.