1

i am testing efficiency of DOM, SAX and StAX.

Basically what i do is that i use spring stopwatch and different sizes of XML and then compare results.

I also thought that i could measure time while elements are loaded to objects and objects to array, but that has nothing to do with parsring.

here are my codes for SAX

  StopWatch stopWatch = new StopWatch("SAX");
  stopWatch.start("SAX");  
  SAXParserFactory spf = SAXParserFactory.newInstance();
  spf.setValidating(false);
  SAXParser sp = spf.newSAXParser();
  XMLReader parser = sp.getXMLReader();
  parser.setErrorHandler(new Chyby());
  parser.setContentHandler(new DefaultHandler());
  parser.parse(file);
 stopWatch.stop();
 System.out.println(stopWatch.prettyPrint());

for StAX

  int temp = 0;
  StopWatch stopWatch = new StopWatch("StAX");
  stopWatch.start("StAX");    
  XMLInputFactory f = XMLInputFactory.newInstance();
  XMLStreamReader r = f.createXMLStreamReader( new FileInputStream( file ));   
    while (r.hasNext()==true){
    temp++;
    r.next();
    }
     System.out.println("parsed");
  stopWatch.stop();
 System.out.println(stopWatch.prettyPrint());

DOM

StopWatch stopWatch = new StopWatch("DOM");
stopWatch.start("DOM");
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
Document document = builder.parse(subor);
System.out.println("parsed");
System.out.println("----------------\n");
    stopWatch.stop();
 System.out.println(stopWatch.prettyPrint());

My question is: Am i doing it right? is there other approach for testing parsers? Thanks

2 Answers 2

2

Creating JAXP factory classes is a very expensive operation, and its cost depends highly on what JARs are present on the classpath. You don't really want to measure that.

You need to take care to eliminate Java start-up costs. Parse a few documents before you start measuring. Run the measurements repeatedly, average the results, and check that the results are consistent.

I would run the test with documents of different sizes. Typically the cost will be (ax+b) where x is the document size. The figure 'b' here represents the "per-document overhead" and can be quite significant if the documents are small.

In the case of DOM there may well be garbage collections occurring which can distort the results because they happen at unpredictable times. Forcing garbage collection at known times is sometimes recommended to get consistent measurements.

Sign up to request clarification or add additional context in comments.

Comments

1

You may want to factor the creation of the factories out of the performance run or measure them separately. You will probably want to touch all the data to prevent a parser from falsely looking good of it lazily builds objects.

3 Comments

i dont understand the second sentance. Should i measure with building objets or not? :)
@sevdah - by touch all the data I meant get all the text node and attribute values as Strings to ensure the test is as even as possible. A lazy DOM impl for example could do nothing on a parse call and only build the DOM nodes as the tree is traversed making the parse call appear super fast.
that was my first version of testing, but then i realized that my code for getting all nodes doesnt have to be equally "well written". Another thing is that i had all text nodes in system.out.println and that also take some time. So should i make it in the way to traverse whole document, but nothing to write on display?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.