Java parsers testing

Question

i am testing efficiency of DOM, SAX and StAX.

Basically what i do is that i use spring stopwatch and different sizes of XML and then compare results.

I also thought that i could measure time while elements are loaded to objects and objects to array, but that has nothing to do with parsring.

here are my codes for SAX

  StopWatch stopWatch = new StopWatch("SAX");
  stopWatch.start("SAX");  
  SAXParserFactory spf = SAXParserFactory.newInstance();
  spf.setValidating(false);
  SAXParser sp = spf.newSAXParser();
  XMLReader parser = sp.getXMLReader();
  parser.setErrorHandler(new Chyby());
  parser.setContentHandler(new DefaultHandler());
  parser.parse(file);
 stopWatch.stop();
 System.out.println(stopWatch.prettyPrint());

for StAX

  int temp = 0;
  StopWatch stopWatch = new StopWatch("StAX");
  stopWatch.start("StAX");    
  XMLInputFactory f = XMLInputFactory.newInstance();
  XMLStreamReader r = f.createXMLStreamReader( new FileInputStream( file ));   
    while (r.hasNext()==true){
    temp++;
    r.next();
    }
     System.out.println("parsed");
  stopWatch.stop();
 System.out.println(stopWatch.prettyPrint());

DOM

StopWatch stopWatch = new StopWatch("DOM");
stopWatch.start("DOM");
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
Document document = builder.parse(subor);
System.out.println("parsed");
System.out.println("----------------\n");
    stopWatch.stop();
 System.out.println(stopWatch.prettyPrint());

My question is: Am i doing it right? is there other approach for testing parsers? Thanks

Michael Kay · Accepted Answer · 2013-05-02 06:51:05Z

Creating JAXP factory classes is a very expensive operation, and its cost depends highly on what JARs are present on the classpath. You don't really want to measure that.

You need to take care to eliminate Java start-up costs. Parse a few documents before you start measuring. Run the measurements repeatedly, average the results, and check that the results are consistent.

I would run the test with documents of different sizes. Typically the cost will be (ax+b) where x is the document size. The figure 'b' here represents the "per-document overhead" and can be quite significant if the documents are small.

In the case of DOM there may well be garbage collections occurring which can distort the results because they happen at unpredictable times. Forcing garbage collection at known times is sometimes recommended to get consistent measurements.

bdoughan · Accepted Answer · 2013-05-02 00:39:21Z

1

You may want to factor the creation of the factories out of the performance run or measure them separately. You will probably want to touch all the data to prevent a parser from falsely looking good of it lazily builds objects.

answered May 2, 2013 at 0:39

bdoughan

149k25 gold badges309 silver badges410 bronze badges

3 Comments

ivanz Over a year ago

i dont understand the second sentance. Should i measure with building objets or not? :)

bdoughan Over a year ago

@sevdah - by touch all the data I meant get all the text node and attribute values as Strings to ensure the test is as even as possible. A lazy DOM impl for example could do nothing on a parse call and only build the DOM nodes as the tree is traversed making the parse call appear super fast.

ivanz Over a year ago

that was my first version of testing, but then i realized that my code for getting all nodes doesnt have to be equally "well written". Another thing is that i had all text nodes in system.out.println and that also take some time. So should i make it in the way to traverse whole document, but nothing to write on display?

Collectives™ on Stack Overflow

Java parsers testing

2 Answers 2

Comments

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related