i want to parse a file which is similar to a HTML file . Its not exactly a html file.It can contain some user defined tags. I dont know in advance how the tags are nested in one another in advance.The tags may also have attributes. I think i shold use a SAX parser. Does java have a inbuilt SAX . Can i call a function when i encounter each tag?
3 Answers
Use following packages, java.io,javax.xml.parsers,org.xml.sax.
SAXParserFactory spf = SAXParserFactory.newInstance();
XMLReader reader = null;
SAXParser parser = spf.newSAXParser();
reader = parser.getXMLReader();
reader.setContentHandler(new MyContentHandler());
//XMLReader to parse the entire file.
InputSource is = new InputSource(filename);
reader.parse(is);
// Implements the methods of ContentHandler
class MyContentHandler implements ContentHandler {
}
Comments
I think you should use StAX instead, which is faster and easier to use than SAX. It's part of Java SE 6.
2 Comments
cletus
I disagree with it being easier to use. startElement() in SAX essentially passes you a map of attributes. You otehrwise have to write a more complicated piece of code to derive this information from StAX.
gustafc
On the other hand, StAX lets you parse XML documents with a simple recursive descent parser where the call stack matches the element stack. Using SAX you'd have to write a state machine, which requires a lot more boilerplate and which at least I consider a lot harder to get right than a util method reading the attributes from a StAX cursor into a map.
SAX was originally Java only, so yes, Java has a built-in SAX parser. This will only work if your document is well formed.