I have this method which loads an XHTML document from a java.io.InputStream returning a org.w3c.dom.Document.
private Document loadDocFrom(InputStream is) throws SAXException,
IOException, ParserConfigurationException {
DocumentBuilderFactory domFactory = DocumentBuilderFactory
.newInstance();
domFactory.setNamespaceAware(true); // never forget this
DocumentBuilder builder = domFactory.newDocumentBuilder();
Document doc = builder.parse(is);
is.close();
return doc;
}
This method works, I have tested it with some XHTML documents (e.g. http://pastebin.com/L2kHwggU) and XHTML websites.
But, for some documents such as this http://pastebin.com/v675yWSJ or even websites like www.w3.org, it enters an infinite loop at Document doc = builder.parse(is);.
EDIT:
@Michael Kay found the problem, but I am waiting for his solution.
One of the other possible solutions is to ignore the DTD:
domFactory.setFeature("http://apache.org/xml/features/nonvalidating/load-external-dtd", false)
Thank you for your help.