1

I'm a newbie to HtmlUnit, and I'm writing a demo script to load the source HTML of a webpage and write it to a txt file.

public static void main(String[] args) throws IOException {
    try (final WebClient wc = new WebClient(BrowserVersion.BEST_SUPPORTED)) {
        wc.getOptions().setThrowExceptionOnScriptError(false);
        
        final HtmlPage page = wc.getPage("https://www.sainsburys.co.uk/gol-ui/SearchResults/biscuits");
        WebResponse res = page.getWebResponse();
        String html = res.getContentAsString();
        
        FileWriter fw = new FileWriter(dir + "/pageHtml.txt");
        fw.write(html);
        fw.close();
    }
}

However, it returns the HTML for disabled JavaScript. To try and fix this, I added this line to ensure JS is enabled on the WebClient:

        wc.getOptions().setJavaScriptEnabled(true);

Despite that, nothing changes. Am I being an idiot, or is there something more subtle that needs to change?

Thanks for any help! ^_^

2
  • What do you get? Are you confirming that you're waiting for asynchronous things to happen? (I suggest considering Geb as a DSL on top of Selenium, as it makes sorting these things out easier.) Commented Nov 2, 2021 at 18:41
  • Hi! I do get some HTML back, but it's essentially a plain HTML page telling me that I need to enable JavaScript. I tried to see if this was recreatable with a different page (this time the page for ASDA - it's for a uni project), and this time it tells me my browser is out of date. Kinda stuck here for ideas what to do. The context behind this is for web scraping for an android app, but Jsoup doesn't support JS I believe. I'll try the solution you provided! Commented Nov 2, 2021 at 18:51

1 Answer 1

2
WebResponse res = page.getWebResponse();
String html = res.getContentAsString();

This is the response (code) you got from the server. If you like to have the current DOM (the one after the js processing is done you can do something like

HtmlPage page = webClient.getPage(url);
webClient.waitForBackgroundJavaScript(60_000);

System.out.println(page.asXml());

or

System.out.println(page.asNormalizedText());
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.