0

In my java code, I am trying to harvest a web page using HTMLUnit libraries. My code is simple as follows,

public static void main(String [] args) throws FailingHttpStatusCodeException, MalformedURLException, IOException
{
        webClient = new WebClient();

        HtmlPage page = webClient.getPage("https://www.xxxxxxx.com/yyyyyy/");

        System.out.println(page.getTitleText());

        webClient.close();

}

However, once I run the code, it produces the following exceptions:

Exception class=[net.sourceforge.htmlunit.corejs.javascript.JavaScriptException]
com.gargoylesoftware.htmlunit.ScriptException: SyntaxError: with statements not allowed in strict mode (https://www.wtatennis.com/resources/v2.1.0/scripts/vendors.min.js#1)
    at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine$HtmlUnitContextAction.run(JavaScriptEngine.java:882)
    at net.sourceforge.htmlunit.corejs.javascript.Context.call(Context.java:624)
    at net.sourceforge.htmlunit.corejs.javascript.ContextFactory.call(ContextFactory.java:537)
    at com.gargoylesoftware.htmlunit.javascript.HtmlUnitContextFactory.callSecured(HtmlUnitContextFactory.java:354)
    at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.execute(JavaScriptEngine.java:762)
    at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.execute(JavaScriptEngine.java:738)
    at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.execute(JavaScriptEngine.java:103)
    at com.gargoylesoftware.htmlunit.html.HtmlPage.loadExternalJavaScriptFile(HtmlPage.java:1004)
    at com.gargoylesoftware.htmlunit.html.HtmlScript.executeScriptIfNeeded(HtmlScript.java:361)
    at com.gargoylesoftware.htmlunit.html.HtmlScript$2.execute(HtmlScript.java:234)
    at com.gargoylesoftware.htmlunit.html.HtmlScript.onAllChildrenAddedToPage(HtmlScript.java:256)
    at com.gargoylesoftware.htmlunit.html.parser.neko.HtmlUnitNekoDOMBuilder.endElement(HtmlUnitNekoDOMBuilder.java:559)
    at org.apache.xerces.parsers.AbstractSAXParser.endElement(Unknown Source)
    at com.gargoylesoftware.htmlunit.html.parser.neko.HtmlUnitNekoDOMBuilder.endElement(HtmlUnitNekoDOMBuilder.java:513)
    at net.sourceforge.htmlunit.cyberneko.HTMLTagBalancer.callEndElement(HTMLTagBalancer.java:1192)
    at net.sourceforge.htmlunit.cyberneko.HTMLTagBalancer.endElement(HTMLTagBalancer.java:1132)
    at net.sourceforge.htmlunit.cyberneko.filters.DefaultFilter.endElement(DefaultFilter.java:219)
    at net.sourceforge.htmlunit.cyberneko.filters.NamespaceBinder.endElement(NamespaceBinder.java:312)
    at net.sourceforge.htmlunit.cyberneko.HTMLScanner$ContentScanner.scanEndElement(HTMLScanner.java:3185)
    at net.sourceforge.htmlunit.cyberneko.HTMLScanner$ContentScanner.scan(HTMLScanner.java:2110)
    at net.sourceforge.htmlunit.cyberneko.HTMLScanner.scanDocument(HTMLScanner.java:937)
    at net.sourceforge.htmlunit.cyberneko.HTMLConfiguration.parse(HTMLConfiguration.java:443)
    at net.sourceforge.htmlunit.cyberneko.HTMLConfiguration.parse(HTMLConfiguration.java:394)
    at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
    at com.gargoylesoftware.htmlunit.html.parser.neko.HtmlUnitNekoDOMBuilder.parse(HtmlUnitNekoDOMBuilder.java:758)
    at com.gargoylesoftware.htmlunit.html.parser.neko.HtmlUnitNekoHtmlParser.parse(HtmlUnitNekoHtmlParser.java:236)
    at com.gargoylesoftware.htmlunit.html.parser.neko.HtmlUnitNekoHtmlParser.parseHtml(HtmlUnitNekoHtmlParser.java:179)
    at com.gargoylesoftware.htmlunit.DefaultPageCreator.createHtmlPage(DefaultPageCreator.java:280)
    at com.gargoylesoftware.htmlunit.DefaultPageCreator.createPage(DefaultPageCreator.java:163)
    at com.gargoylesoftware.htmlunit.WebClient.loadWebResponseInto(WebClient.java:553)
    at com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:419)
    at com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:336)
    at com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:488)
    at com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:469)
    at htmlunit.WTAHarvester.main(WTAHarvester.java:27)
Caused by: net.sourceforge.htmlunit.corejs.javascript.JavaScriptException: SyntaxError: with statements not allowed in strict mode (https://www.wtatennis.com/resources/v2.1.0/scripts/vendors.min.js#1)
    at net.sourceforge.htmlunit.corejs.javascript.Interpreter.interpretLoop(Interpreter.java:1215)
    at net.sourceforge.htmlunit.corejs.javascript.Interpreter.interpret(Interpreter.java:1009)
    at net.sourceforge.htmlunit.corejs.javascript.InterpretedFunction.call(InterpretedFunction.java:111)
    at net.sourceforge.htmlunit.corejs.javascript.ContextFactory.doTopCall(ContextFactory.java:427)
    at com.gargoylesoftware.htmlunit.javascript.HtmlUnitContextFactory.doTopCall(HtmlUnitContextFactory.java:340)
    at net.sourceforge.htmlunit.corejs.javascript.ScriptRuntime.doTopCall(ScriptRuntime.java:3607)
    at net.sourceforge.htmlunit.corejs.javascript.InterpretedFunction.exec(InterpretedFunction.java:123)
    at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine$2.doRun(JavaScriptEngine.java:753)
    at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine$HtmlUnitContextAction.run(JavaScriptEngine.java:867)
    ... 34 more
JavaScriptException value = SyntaxError: with statements not allowed in strict mode
======= EXCEPTION END ========

Exception in thread "main" ======= EXCEPTION START ========
Exception class=[net.sourceforge.htmlunit.corejs.javascript.JavaScriptException]
com.gargoylesoftware.htmlunit.ScriptException: SyntaxError: with statements not allowed in strict mode (https://www.wtatennis.com/resources/v2.1.0/scripts/vendors.min.js#1)
    at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine$HtmlUnitContextAction.run(JavaScriptEngine.java:882)
    at net.sourceforge.htmlunit.corejs.javascript.Context.call(Context.java:624)
    at net.sourceforge.htmlunit.corejs.javascript.ContextFactory.call(ContextFactory.java:537)
    at com.gargoylesoftware.htmlunit.javascript.HtmlUnitContextFactory.callSecured(HtmlUnitContextFactory.java:354)
    at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.execute(JavaScriptEngine.java:762)
    at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.execute(JavaScriptEngine.java:738)
    at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.execute(JavaScriptEngine.java:103)
    at com.gargoylesoftware.htmlunit.html.HtmlPage.loadExternalJavaScriptFile(HtmlPage.java:1004)
    at com.gargoylesoftware.htmlunit.html.HtmlScript.executeScriptIfNeeded(HtmlScript.java:361)
    at com.gargoylesoftware.htmlunit.html.HtmlScript$2.execute(HtmlScript.java:234)
    at com.gargoylesoftware.htmlunit.html.HtmlScript.onAllChildrenAddedToPage(HtmlScript.java:256)
    at com.gargoylesoftware.htmlunit.html.parser.neko.HtmlUnitNekoDOMBuilder.endElement(HtmlUnitNekoDOMBuilder.java:559)
    at org.apache.xerces.parsers.AbstractSAXParser.endElement(Unknown Source)
    at com.gargoylesoftware.htmlunit.html.parser.neko.HtmlUnitNekoDOMBuilder.endElement(HtmlUnitNekoDOMBuilder.java:513)
    at net.sourceforge.htmlunit.cyberneko.HTMLTagBalancer.callEndElement(HTMLTagBalancer.java:1192)
    at net.sourceforge.htmlunit.cyberneko.HTMLTagBalancer.endElement(HTMLTagBalancer.java:1132)
    at net.sourceforge.htmlunit.cyberneko.filters.DefaultFilter.endElement(DefaultFilter.java:219)
    at net.sourceforge.htmlunit.cyberneko.filters.NamespaceBinder.endElement(NamespaceBinder.java:312)
    at net.sourceforge.htmlunit.cyberneko.HTMLScanner$ContentScanner.scanEndElement(HTMLScanner.java:3185)
    at net.sourceforge.htmlunit.cyberneko.HTMLScanner$ContentScanner.scan(HTMLScanner.java:2110)
    at net.sourceforge.htmlunit.cyberneko.HTMLScanner.scanDocument(HTMLScanner.java:937)
    at net.sourceforge.htmlunit.cyberneko.HTMLConfiguration.parse(HTMLConfiguration.java:443)
    at net.sourceforge.htmlunit.cyberneko.HTMLConfiguration.parse(HTMLConfiguration.java:394)
    at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
    at com.gargoylesoftware.htmlunit.html.parser.neko.HtmlUnitNekoDOMBuilder.parse(HtmlUnitNekoDOMBuilder.java:758)
    at com.gargoylesoftware.htmlunit.html.parser.neko.HtmlUnitNekoHtmlParser.parse(HtmlUnitNekoHtmlParser.java:236)
    at com.gargoylesoftware.htmlunit.html.parser.neko.HtmlUnitNekoHtmlParser.parseHtml(HtmlUnitNekoHtmlParser.java:179)
    at com.gargoylesoftware.htmlunit.DefaultPageCreator.createHtmlPage(DefaultPageCreator.java:280)
    at com.gargoylesoftware.htmlunit.DefaultPageCreator.createPage(DefaultPageCreator.java:163)
    at com.gargoylesoftware.htmlunit.WebClient.loadWebResponseInto(WebClient.java:553)
    at com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:419)
    at com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:336)
    at com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:488)
    at com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:469)
    at htmlunit.WTAHarvester.main(WTAHarvester.java:27)
Caused by: net.sourceforge.htmlunit.corejs.javascript.JavaScriptException: SyntaxError: with statements not allowed in strict mode (https://www.wtatennis.com/resources/v2.1.0/scripts/vendors.min.js#1)
    at net.sourceforge.htmlunit.corejs.javascript.Interpreter.interpretLoop(Interpreter.java:1215)
    at net.sourceforge.htmlunit.corejs.javascript.Interpreter.interpret(Interpreter.java:1009)
    at net.sourceforge.htmlunit.corejs.javascript.InterpretedFunction.call(InterpretedFunction.java:111)
    at net.sourceforge.htmlunit.corejs.javascript.ContextFactory.doTopCall(ContextFactory.java:427)
    at com.gargoylesoftware.htmlunit.javascript.HtmlUnitContextFactory.doTopCall(HtmlUnitContextFactory.java:340)
    at net.sourceforge.htmlunit.corejs.javascript.ScriptRuntime.doTopCall(ScriptRuntime.java:3607)
    at net.sourceforge.htmlunit.corejs.javascript.InterpretedFunction.exec(InterpretedFunction.java:123)
    at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine$2.doRun(JavaScriptEngine.java:753)
    at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine$HtmlUnitContextAction.run(JavaScriptEngine.java:867)
    ... 34 more
JavaScriptException value = SyntaxError: with statements not allowed in strict mode
======= EXCEPTION END ========

1 Answer 1

1

The problem comes from this file:

https://www.wtatennis.com/resources/v2.1.0/scripts/vendors.min.js#1

That file contains minified libraries, concatenated together. Among these libraries, there is underscore.js, which uses a with statement as you can see in underscoreJS's source code.

But the file it's included in (first link above) also has a "use strict"; statement, which will throw errors when it detects practices it assumes to be unsafe. The with statement is one of them. Other people have had this problem in the past, and it's fixable if they can change their scripts.

That being said, I don't see the error when going to the homepage of that website. But even if I did, I guess you don't have control over the JS which runs on this page. I don't know Java, nor the WebClient class(?) you're using, but maybe you don't need to execute the page's JS, and are able to disable scripts?

webClient.getOptions().setJavaScriptEnabled(false);
Sign up to request clarification or add additional context in comments.

3 Comments

Thanks. I can disable it indeed and the errors are gone in this case but I need Java script to load some html elements that I need to harvest and that's why I need to keep Java script enabled. I don't have control over js because it is not my website. So are you saying there is no solution?
@TravelingSalesman I don't see a clean way to do it unless you have control over the original website. But again, I don't know Java and this WebClient class. However, if I had to do it myself using other tools, I would probably try to intercept the request to that file, and alter the response to remove the "use strict"; statement. Not very clean, but could get the job done. Maybe ExchangeFilterFunction could help with this? Edit a comment under that answer suggests that it does not allow accessing the body. Might be a deadend...
Even dirtier suggestion: instead of altering the body, an alternative could be to alter the headers, by setting the response status to 302 Redirect and location to a URL where you host your own, altered version of the script.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.