I am going to parse URLs in specific location of one website. For this purpose I wrote a simple program in Java. But this program returns null pointer exception. It seems that getNameItem("href") returns null. I am suspicious about wrong way of using getNameItem to extract URLs inside "href" tag.
DocumentBuilder b = DocumentBuilderFactory.newInstance().newDocumentBuilder();
org.w3c.dom.Document doc = b.parse(new FileInputStream("clean.html"));
//Evaluate XPath against Document itself
javax.xml.xpath.XPath xPath = XPathFactory.newInstance().newXPath();
NodeList nodes = (NodeList)xPath.evaluate(".//*[@class='r_news_box']",
doc.getDocumentElement(), XPathConstants.NODESET);
for (int i = 0; i < nodes.getLength(); ++i) {
Element e = (Element) nodes.item(i);
System.out.println(e.getAttributes().getNamedItem("href").getTextContent());
}
P.S: here is one of the nodes that should be selected by this xpath:
<div class="r_news_box">
<a class="picLink" target="_blank" href="/fa/news/427583/test">
<img class="r_news_img" width="50" height="65" src="/files/fa/news/1393/5/29/411217_553.jpg" alt="test"/>
</a>