1

I'm using Selenium and the HTMLUnit with Javascript enabled to read websites in Python. Unfortunately, I'm running into problems with websites that don't have the cleanest Javascript. For example:

from selenium import webdriver

try:
    browser = webdriver.Remote(desired_capabilities=webdriver.DesiredCapabilities.HTMLUNITWITHJS)
    browser.get('https://www.ebay.com/')
    browser.close()
    print('success')
except Exception as e:
    print(e)

This results in an error being raised as if python is being passed javascript errors through the webdriver. Note, this does not happen with the Chrome, Firefox, or IE webdrivers.

Exception e:

TypeError: Cannot read property "classList" from undefined (script in https://www.ebay.com/ from (46, 26) to (73, 78)#70)
Stacktrace:
at net.sourceforge.htmlunit.corejs.javascript.ScriptRuntime.constructError (ScriptRuntime.java:4130)
at net.sourceforge.htmlunit.corejs.javascript.ScriptRuntime.constructError (ScriptRuntime.java:4108)
at net.sourceforge.htmlunit.corejs.javascript.ScriptRuntime.typeError (ScriptRuntime.java:4141)
at net.sourceforge.htmlunit.corejs.javascript.ScriptRuntime.typeError2 (ScriptRuntime.java:4160)
at net.sourceforge.htmlunit.corejs.javascript.ScriptRuntime.undefReadError (ScriptRuntime.java:4173)
at net.sourceforge.htmlunit.corejs.javascript.ScriptRuntime.getObjectProp (ScriptRuntime.java:1528)
at net.sourceforge.htmlunit.corejs.javascript.Interpreter.interpretLoop (Interpreter.java:1245)
at net.sourceforge.htmlunit.corejs.javascript.Interpreter.interpret (Interpreter.java:815)
at net.sourceforge.htmlunit.corejs.javascript.InterpretedFunction.call (InterpretedFunction.java:111)
at net.sourceforge.htmlunit.corejs.javascript.NativeArray.iterativeMethod (NativeArray.java:1671)
at net.sourceforge.htmlunit.corejs.javascript.NativeArray.execIdCall (NativeArray.java:353)
at net.sourceforge.htmlunit.corejs.javascript.IdFunctionObject.call (IdFunctionObject.java:101)
at net.sourceforge.htmlunit.corejs.javascript.Interpreter.interpretLoop (Interpreter.java:1484)
at net.sourceforge.htmlunit.corejs.javascript.Interpreter.interpret (Interpreter.java:815)
at net.sourceforge.htmlunit.corejs.javascript.InterpretedFunction.call (InterpretedFunction.java:111)
at net.sourceforge.htmlunit.corejs.javascript.NativeArray.iterativeMethod (NativeArray.java:1671)
at net.sourceforge.htmlunit.corejs.javascript.NativeArray.execIdCall (NativeArray.java:353)
at net.sourceforge.htmlunit.corejs.javascript.IdFunctionObject.call (IdFunctionObject.java:101)
at net.sourceforge.htmlunit.corejs.javascript.Interpreter.interpretLoop (Interpreter.java:1484)
at net.sourceforge.htmlunit.corejs.javascript.Interpreter.interpret (Interpreter.java:815)
at net.sourceforge.htmlunit.corejs.javascript.InterpretedFunction.call (InterpretedFunction.java:111)
at net.sourceforge.htmlunit.corejs.javascript.ContextFactory.doTopCall (ContextFactory.java:417)
at com.gargoylesoftware.htmlunit.javascript.HtmlUnitContextFactory.doTopCall (HtmlUnitContextFactory.java:325)
at net.sourceforge.htmlunit.corejs.javascript.ScriptRuntime.doTopCall (ScriptRuntime.java:3424)
at net.sourceforge.htmlunit.corejs.javascript.InterpretedFunction.exec (InterpretedFunction.java:122)
at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine$3.doRun (JavaScriptEngine.java:781)
at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine$HtmlUnitContextAction.run (JavaScriptEngine.java:895)
at net.sourceforge.htmlunit.corejs.javascript.Context.call (Context.java:599)
at net.sourceforge.htmlunit.corejs.javascript.ContextFactory.call (ContextFactory.java:527)
at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.execute (JavaScriptEngine.java:790)
at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.execute (JavaScriptEngine.java:766)
at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.execute (JavaScriptEngine.java:757)
at com.gargoylesoftware.htmlunit.html.HtmlPage.executeJavaScript (HtmlPage.java:920)
at com.gargoylesoftware.htmlunit.html.HtmlScript.executeInlineScriptIfNeeded (HtmlScript.java:316)
at com.gargoylesoftware.htmlunit.html.HtmlScript.executeScriptIfNeeded (HtmlScript.java:396)
at com.gargoylesoftware.htmlunit.html.HtmlScript$2.execute (HtmlScript.java:246)
at com.gargoylesoftware.htmlunit.html.HtmlScript.onAllChildrenAddedToPage (HtmlScript.java:267)
at com.gargoylesoftware.htmlunit.html.HTMLParser$HtmlUnitDOMBuilder.endElement (HTMLParser.java:805)
at org.apache.xerces.parsers.AbstractSAXParser.endElement (None:-1)
at com.gargoylesoftware.htmlunit.html.HTMLParser$HtmlUnitDOMBuilder.endElement (HTMLParser.java:761)
at net.sourceforge.htmlunit.cyberneko.HTMLTagBalancer.callEndElement (HTMLTagBalancer.java:1236)
at net.sourceforge.htmlunit.cyberneko.HTMLTagBalancer.endElement (HTMLTagBalancer.java:1136)
at net.sourceforge.htmlunit.cyberneko.filters.DefaultFilter.endElement (DefaultFilter.java:226)
at net.sourceforge.htmlunit.cyberneko.filters.NamespaceBinder.endElement (NamespaceBinder.java:345)
at net.sourceforge.htmlunit.cyberneko.HTMLScanner$ContentScanner.scanEndElement (HTMLScanner.java:3178)
at net.sourceforge.htmlunit.cyberneko.HTMLScanner$ContentScanner.scan (HTMLScanner.java:2141)
at net.sourceforge.htmlunit.cyberneko.HTMLScanner.scanDocument (HTMLScanner.java:945)
at net.sourceforge.htmlunit.cyberneko.HTMLConfiguration.parse (HTMLConfiguration.java:521)
at net.sourceforge.htmlunit.cyberneko.HTMLConfiguration.parse (HTMLConfiguration.java:472)
at org.apache.xerces.parsers.XMLParser.parse (None:-1)
at com.gargoylesoftware.htmlunit.html.HTMLParser$HtmlUnitDOMBuilder.parse (HTMLParser.java:1004)
at com.gargoylesoftware.htmlunit.html.HTMLParser.parse (HTMLParser.java:253)
at com.gargoylesoftware.htmlunit.html.HTMLParser.parseHtml (HTMLParser.java:195)
at com.gargoylesoftware.htmlunit.DefaultPageCreator.createHtmlPage (DefaultPageCreator.java:267)
at com.gargoylesoftware.htmlunit.DefaultPageCreator.createPage (DefaultPageCreator.java:158)
at com.gargoylesoftware.htmlunit.WebClient.loadWebResponseInto (WebClient.java:524)
at com.gargoylesoftware.htmlunit.WebClient.getPage (WebClient.java:398)
at com.gargoylesoftware.htmlunit.WebClient.getPage (WebClient.java:315)
at org.openqa.selenium.htmlunit.HtmlUnitDriver.get (HtmlUnitDriver.java:670)
at org.openqa.selenium.htmlunit.HtmlUnitDriver.lambda$get$8 (HtmlUnitDriver.java:657)
at org.openqa.selenium.htmlunit.HtmlUnitDriver.lambda$runAsync$0 (HtmlUnitDriver.java:414)
at java.lang.Thread.run (None:-1)

I have found the following for Java which looks like it should work:

WebClient client = new WebClient();
client.getOptions().setThrowExceptionOnScriptError(false);

I cannot figure out how to implement this in Python, any advice?

1 Answer 1

1

It would appear that an implementation of a custom error handler solves the problem, for example:

from selenium import webdriver
from selenium.webdriver.remote.errorhandler import ErrorHandler

class MyHandler(ErrorHandler):
    def check_response(self, response):
        try:
            super(MyHandler, self).check_response(response)
        except Exception as e:
            pass

try:
    browser = webdriver.Remote(desired_capabilities=webdriver.DesiredCapabilities.HTMLUNITWITHJS)
    browser.error_handler = MyHandler()
    browser.get('https://www.ebay.com/')
    browser.close()
    print('success')
except Exception as e:
    print(e)
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.