0

I have been using JSOUP for all my html website requirements thus far. I have however, ran into a roadblock. Kickass gets the full list of files from each torrent by clicking a javascript link <a href="javascript:getFiles('52261EB9480EDFD83B5B85C8C4817D28F3AE0C95', 1);" class="showmore folded">. I have traced the javascript function back to a *.js file that is used but I am not sure how to mimic this behaviour. Ideally I would just like to grab the javascript link from the main site, and get the list like I would with any other website, though everything for JSOUP seems to follow html links rather than javascript ones.

So I tried with HtmlUnit. I inspected the site with chrome: https://kickass.to/australian-aria-top-50-singles-13-10-2014-t9702189.html

and copied the xpath expression. Currently the below does not work, while I would like to get around having to use this library for a single function, I can't get it work in general.

My Test Code:

    java.util.logging.Logger.getLogger("com.gargoylesoftware").setLevel(java.util.logging.Level.OFF);
    WebClient webClient = new WebClient(BrowserVersion.CHROME);
    HtmlPage page = webClient.getPage("https://kickass.to/australian-aria-top-50-singles-13-10-2014-t9702189.html");

    HtmlElement htmlElement = page.getFirstByXPath("//*[@id=\"ul_top\"]/tbody/tr[31]/td[2]/a");
    System.out.println(htmlElement.toString());
    htmlElement.click(); 
    webClient.waitForBackgroundJavaScript(1000);

    //get changes here
    webClient.closeAllWindows();
3
  • Are you trying to download the torrent file using jsoup? Commented Apr 4, 2015 at 4:43
  • Using in-built libraries actually. Purely checking torrent information with JSOUP and htmlunit. Commented Apr 4, 2015 at 4:51
  • Is javascript enabled for htmlunit? I have posted an alternative solution. But, this question might help - stackoverflow.com/questions/10136873/… Commented Apr 4, 2015 at 5:15

1 Answer 1

2

Jsoup does not execute Javascript (as far as I have seen from many questions so far). You should consider using Selenium + HtmlUnitDriver (this runs headless). I have tried out this sample code and the page source contains the content that is displayed after executing the javascript.

Sample code:

//set javascript enabled to true
HtmlUnitDriver driver = new HtmlUnitDriver(true);

//to set logging off....
LogFactory.getFactory().setAttribute("org.apache.commons.logging.Log","org.apache.commons.logging.impl.NoOpLog");
java.util.logging.Logger.getLogger("com.gargoylesoftware.htmlunit").setLevel(Level.OFF);
java.util.logging.Logger.getLogger("org.apache.commons.httpclient").setLevel(Level.OFF);

// navigate to the page
driver.get("https://kickass.to/australian-aria-top-50-singles-13-10-2014-t9702189.html");
driver.executeScript("javascript:getFiles('52261EB9480EDFD83B5B85C8C4817D28F3AE0C95', 1);","");
//this is displayed only after executing the javascript
System.out.println(driver.getPageSource().contains("Australian ARIA Top 50 Singles 13.10.2014.pdf"));
System.out.println(driver.getPageSource().contains("47. Sheppard - Geronimo.mp3"));
//System.out.println(driver.getPageSource());
driver.quit();
Sign up to request clarification or add additional context in comments.

4 Comments

Yea, that worked a lot better! The HtmlUnit initialisation takes a bit, but if I parse it through functions and looks I can get away with the startup process. Would it be worth doing away with jsoup if i am just using it for going through page source, just use htmlunit/selenium instead so I don't have to download the page twice?
I was looking at phantomjs as well, thought it was platform dependent though?
PhantomJS guys run a Ghostdriver project, its for java. Ghostdriver is an implementation of Webdriver, there are a few stackoverflow questions about it. Since, you are just doing website scraping, ghostdriver or htmlunitdriver are good options.
As it turns out...jsoup could do a post to the server and retrieved a json result.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.