I have a requirement where I will hit a link and get a response. The response is an XML data containing child links. The response is then copied to a file and the child links are added to a queue where I then iteratively have to hit the child links until there are no further children.
I first did this using a single queue. But since it slow, I tried to implement a executor. I do not have to maintain the order of the data. This is my approach now :
import java.io.BufferedReader;
import java.io.ByteArrayInputStream;
import java.io.File;
import java.io.FileWriter;
import java.io.IOException;
import java.io.InputStreamReader;
import java.net.HttpURLConnection;
import java.net.URL;
import java.util.AbstractQueue;
import java.util.List;
import java.util.concurrent.ConcurrentLinkedQueue;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.stream.Collectors;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory;
import org.xml.sax.SAXException;
public class Hierarchy2 {
private static AbstractQueue<String> queue = new ConcurrentLinkedQueue<>();
private static FileWriter writer;
private static XMLHandler xmlHandler = new XMLHandler();
public static void main(String[] args) throws IOException, ParserConfigurationException, SAXException {
writer = new FileWriter(new File("hierarchy.txt"));
String baseUrl = "my url goes here";
queue.add(baseUrl);
int threadCount = Runtime.getRuntime().availableProcessors() + 1;
ExecutorService executor = Executors.newFixedThreadPool(threadCount);
for (int i = 0; i < threadCount; i++) {
executor.execute(new QueueProcess(queue, writer, xmlHandler));
}
executor.shutdown();
}
}
class QueueProcess implements Runnable {
private AbstractQueue<String> queue;
private HttpURLConnection connection;
private URL url;
private FileWriter writer;
private SAXParserFactory factory = SAXParserFactory.newInstance();
private SAXParser saxParser;
private XMLHandler xmlHandler;
public QueueProcess(AbstractQueue<String> queue, FileWriter writer, XMLHandler xmlHandler) {
this.queue = queue;
this.writer = writer;
this.xmlHandler = xmlHandler;
}
@Override
public void run() {
try {
saxParser = factory.newSAXParser();
while (true) {
String link = queue.poll();
if (link != null) {
if (queue.size() >= 500) {
System.out.println("here" + " " + Thread.currentThread().getName());
getChildLinks(link);
} else {
System.out.println(link + " " + Thread.currentThread().getName());
queue.addAll(getChildLinks(link));
}
}
}
} catch (IOException | SAXException | ParserConfigurationException e) {
e.printStackTrace();
}
}
private List<String> getChildLinks(String link) throws IOException, SAXException {
url = new URL(link);
connection = (HttpURLConnection) url.openConnection();
connection.connect();
String result = new BufferedReader(new InputStreamReader(connection.getInputStream())).lines()
.collect(Collectors.joining());
saxParser.parse(new ByteArrayInputStream(result.getBytes()), xmlHandler);
List<String> urlList = xmlHandler.getURLList();
writer.write(result + System.lineSeparator());
connection.disconnect();
return urlList;
}
}
The code throws a NPE is some places. That needs to be fixed which I will. However, is a concurrent access to the FileWriter right ?
Please tell me if this code is right in achieving what I really want to do. Also improvements/suggestions on making it further efficient are appreciated.
importlines, and amain()that shows how to call your function. It's not mandatory, but it really helps! \$\endgroup\$execute()call in the main method. where is the loop or some other place where the rest of the tasks are invoked?? \$\endgroup\$