0

My program goes to a my uni results page, finds all the links and saves to a file. Then I read the file and copy only lines which contain required links and save it to another file. Then I parse it again to extract required data

public class net {

    public static void main(String[] args) throws Exception {
        Document doc = Jsoup.connect("http://jntuconnect.net/results_archive/").get();

        Elements links = doc.select("a");
        File f1 = new File("flink.txt");
        File f2 = new File("rlink.txt");

            //write extracted links to f1 file
        FileUtils.writeLines(f1, links);

            // store each link from f1 file in string list
        List<String>  linklist  = FileUtils.readLines(f1);

            // second string list to store only required link elements
        List<String> rlinklist = new ArrayList<String>();

        // loop which finds required links and stores in rlinklist 
        for(String elem : linklist){
            if(elem.contains("B.Tech") && (elem.contains("R07")||elem.contains("R09"))){
                rlinklist.add(elem);                
            }           
        }           
        //store required links in f2 file
        FileUtils.writeLines(f2, rlinklist);

        // parse links from f2  file
        Document rdoc = Jsoup.parse(f2, null);
        Elements rlinks = rdoc.select("a");

        //  for storing hrefs and link text 
        List<String> rhref = new ArrayList<String>();
        List<String> rtext = new ArrayList<String>();

        for(Element rlink : rlinks){
            rhref.add(rlink.attr("href"));
            rtext.add(rlink.text());
        }

    }// end main

}

I don't want to create files to do this. Is there a better way to get hrefs and link texts of only specific urls without creating files?

It uses Apache commons fileutils, jsoup

1
  • You already have the list in memory (Elements links). Just operate on that. Your code to write and read from files is completely unnecessary. Commented Jul 11, 2012 at 4:24

1 Answer 1

1

Here's how you can get rid of the first file write/read:

Elements links = doc.select("a");
List<String> linklist = new ArrayList<String>();
for (Element elt : links) {
    linklist.add(elt.toString());
}

The second round trip, if I understand the code, is intended to extract the links that meet a certain test. You can just do that in memory using the same technique.

I see you're relying on Jsoup.parse to extract the href and link text from the selected links. You can do that in memory by writing the selected nodes to a StringBuffer, convert it to a String by calling it's toString() method, and then using one of the Jsoup.parse methods that takes a String instead of a File argument.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.