6

I have a URL. I want to fetch Page-Source of the URL after executing Java Scripts.

Fetch Page source using HtmlUnit : URL got stuck

Initially I suspected that it is due to system resource and High CPU usage, that the URL is getting stuck.

Then I tried to run it on HTML UNIT 2.9 and 2.11. It got stuck on both while parsing. Refer the above question for HTML UNIT code scrape that is getting stuck.

Now I am suspecting that this might be due to JS Execution going into infinite loop.

I want to check what JS files are causing problem and remove them from execution.

If they are JS for sites like google analytics, twitter etc, I may not need them at all.

So I want to find a way to tell HTML Unit to ignore certain JS file and execute the rest.

Does anybody know how to do that ?

1 Answer 1

5

Try this. It worked for me:

class InterceptWebConnection extends FalsifyingWebConnection{
    public InterceptWebConnection(WebClient webClient) throws IllegalArgumentException{
        super(webClient);
    }
    @Override
    public WebResponse getResponse(WebRequest request) throws IOException {
        WebResponse response=super.getResponse(request);
        if(response.getWebRequest().getUrl().toString().endsWith("dom-drag.js")){
            return createWebResponse(response.getWebRequest(), "", "application/javascript", 200, "Ok");
        }
        return super.getResponse(request);
    }
}

then write following while setting up your webClient

new InterceptWebConnection(webClient);
Sign up to request clarification or add additional context in comments.

2 Comments

I also faced the same issue.
Hi, my web client is created list this WebClient webClient = new WebClient(); Where should I add this interception?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.