1

i want to parse a file which is similar to a HTML file . Its not exactly a html file.It can contain some user defined tags. I dont know in advance how the tags are nested in one another in advance.The tags may also have attributes. I think i shold use a SAX parser. Does java have a inbuilt SAX . Can i call a function when i encounter each tag?

3 Answers 3

4

Use following packages, java.io,javax.xml.parsers,org.xml.sax.

SAXParserFactory spf = SAXParserFactory.newInstance();
XMLReader reader = null;

  SAXParser parser = spf.newSAXParser();
  reader = parser.getXMLReader();

reader.setContentHandler(new MyContentHandler());

//XMLReader to parse the entire file.

  InputSource is = new InputSource(filename);
  reader.parse(is);

// Implements the methods of ContentHandler

class MyContentHandler implements ContentHandler {
}
Sign up to request clarification or add additional context in comments.

Comments

2

I think you should use StAX instead, which is faster and easier to use than SAX. It's part of Java SE 6.

2 Comments

I disagree with it being easier to use. startElement() in SAX essentially passes you a map of attributes. You otehrwise have to write a more complicated piece of code to derive this information from StAX.
On the other hand, StAX lets you parse XML documents with a simple recursive descent parser where the call stack matches the element stack. Using SAX you'd have to write a state machine, which requires a lot more boilerplate and which at least I consider a lot harder to get right than a util method reading the attributes from a StAX cursor into a map.
0

SAX was originally Java only, so yes, Java has a built-in SAX parser. This will only work if your document is well formed.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.