1

I have a question for you , which I think could possibly be solved using Selenium I have a set of URLs like for example the one below.

http://www.sears.com/search=little tikes&Little Tikes?filter=Brand&keywordSearch=false&vName=Toys+%26+Games&catalogId=12605&catPrediction=false&previousSort=ORIGINAL_SORT_ORDER&viewItems=50&storeId=10153&adCell=W3

if you paste the URL as it is in the browser it will end up redirecting to another URL , which you can verify in the address bar of the browser(Firefox for example). I need to get the redirected URL , regardless of if the redirect was from a javascript code or not is it possible to do this using the selenium framework ?

I have already tried using HTMLUnit for this however I get the following javascript execution error. Please help!

com.gargoylesoftware.htmlunit.ScriptException: TypeError: Cannot call method "indexOf" of null (script in http://www.sears.com/search=little%20tikes&Little%20Tikes?filter=Brand&keywordSearch=false&catalogId=12605&adCell=W3&catPrediction=false&previousSort=ORIGINAL_SORT_ORDER&viewItems=50&storeId=10153&levels=Toys+%26+Games from (6942, 33) to (6974, 14)#6966)
    at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine$HtmlUnitContextAction.run(JavaScriptEngine.java:669) ~[htmlunit-2.12.jar:2.12]
    at net.sourceforge.htmlunit.corejs.javascript.Context.call(Context.java:601) ~[htmlunit-core-js-2.12.jar:?]
    at net.sourceforge.htmlunit.corejs.javascript.ContextFactory.call(ContextFactory.java:507) ~[htmlunit-core-js-2.12.jar:?]
    at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.execute(JavaScriptEngine.java:555) ~[htmlunit-2.12.jar:2.12]
    at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.execute(JavaScriptEngine.java:530) ~[htmlunit-2.12.jar:2.12]
    at com.gargoylesoftware.htmlunit.html.HtmlPage.executeJavaScriptIfPossible(HtmlPage.java:979) ~[htmlunit-2.12.jar:2.12]
    at com.gargoylesoftware.htmlunit.html.HtmlScript.executeInlineScriptIfNeeded(HtmlScript.java:337) ~[htmlunit-2.12.jar:2.12]
    at com.gargoylesoftware.htmlunit.html.HtmlScript.executeScriptIfNeeded(HtmlScript.java:415) ~[htmlunit-2.12.jar:2.12]
    at com.gargoylesoftware.htmlunit.html.HtmlScript$3.execute(HtmlScript.java:266) ~[htmlunit-2.12.jar:2.12]
    at com.gargoylesoftware.htmlunit.html.HtmlScript.onAllChildrenAddedToPage(HtmlScript.java:276) ~[htmlunit-2.12.jar:2.12]
    at com.gargoylesoftware.htmlunit.html.HTMLParser$HtmlUnitDOMBuilder.endElement(HTMLParser.java:676) ~[htmlunit-2.12.jar:2.12]
    at org.apache.xerces.parsers.AbstractSAXParser.endElement(Unknown Source) ~[xercesImpl-2.10.0.jar:?]
    at com.gargoylesoftware.htmlunit.html.HTMLParser$HtmlUnitDOMBuilder.endElement(HTMLParser.java:635) ~[htmlunit-2.12.jar:2.12]
    at org.cyberneko.html.HTMLTagBalancer.callEndElement(HTMLTagBalancer.java:1170) ~[nekohtml-1.9.18.jar:1.9.18]
    at org.cyberneko.html.HTMLTagBalancer.endElement(HTMLTagBalancer.java:1072) ~[nekohtml-1.9.18.jar:1.9.18]
    at org.cyberneko.html.filters.DefaultFilter.endElement(DefaultFilter.java:206) ~[nekohtml-1.9.18.jar:?]
    at org.cyberneko.html.filters.NamespaceBinder.endElement(NamespaceBinder.java:330) ~[nekohtml-1.9.18.jar:?]
    at org.cyberneko.html.HTMLScanner$ContentScanner.scanEndElement(HTMLScanner.java:3074) ~[nekohtml-1.9.18.jar:1.9.18]
    at org.cyberneko.html.HTMLScanner$ContentScanner.scan(HTMLScanner.java:2041) ~[nekohtml-1.9.18.jar:1.9.18]
    at org.cyberneko.html.HTMLScanner.scanDocument(HTMLScanner.java:918) ~[nekohtml-1.9.18.jar:1.9.18]
    at org.cyberneko.html.HTMLConfiguration.parse(HTMLConfiguration.java:499) ~[nekohtml-1.9.18.jar:1.9.18]
    at org.cyberneko.html.HTMLConfiguration.parse(HTMLConfiguration.java:452) ~[nekohtml-1.9.18.jar:1.9.18]
    at org.apache.xerces.parsers.XMLParser.parse(Unknown Source) ~[xercesImpl-2.10.0.jar:?]
    at com.gargoylesoftware.htmlunit.html.HTMLParser$HtmlUnitDOMBuilder.parse(HTMLParser.java:892) ~[htmlunit-2.12.jar:2.12]
    at com.gargoylesoftware.htmlunit.html.HTMLParser.parse(HTMLParser.java:241) ~[htmlunit-2.12.jar:2.12]
    at com.gargoylesoftware.htmlunit.html.HTMLParser.parseHtml(HTMLParser.java:187) ~[htmlunit-2.12.jar:2.12]
    at com.gargoylesoftware.htmlunit.DefaultPageCreator.createHtmlPage(DefaultPageCreator.java:268) ~[htmlunit-2.12.jar:2.12]
    at com.gargoylesoftware.htmlunit.DefaultPageCreator.createPage(DefaultPageCreator.java:156) ~[htmlunit-2.12.jar:2.12]
    at com.gargoylesoftware.htmlunit.WebClient.loadWebResponseInto(WebClient.java:434) ~[htmlunit-2.12.jar:2.12]
    at com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:309) ~[htmlunit-2.12.jar:2.12]
    at com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:374) ~[htmlunit-2.12.jar:2.12]
2
  • This question could be a duplicate of: stackoverflow.com/questions/20315330/… Commented Dec 31, 2013 at 16:15
  • 1
    does not seem to be a duplicate, the use-case is different click vs get and the exception stack trace is completely different as well Commented Dec 31, 2013 at 16:44

2 Answers 2

1

This should be easy if I have understood you question. Below are the steps

1. Get the FirefoxDriver object
2. call driver.get("http://www.sears.com/search=little tikes&Little Tikes?filter=Brand&keywordSearch=false&vName=Toys+%26+Games&catalogId=12605&catPrediction=false&previousSort=ORIGINAL_SORT_ORDER&viewItems=50&storeId=10153&adCell=W3");
This will open the url in firefox. On open the url will be forwarded to actual url. (This is as per my understanding from your description)
3. Then you can do driver.getCurrentUrl(). This will give you the url.

Let me know if this works for you :)

UPDATE :

        WebClient webClient = new WebClient(BrowserVersion.INTERNET_EXPLORER_9);
        webClient.getOptions().setJavaScriptEnabled(true);
        webClient.getOptions().setRedirectEnabled(true);
        webClient.getOptions().setThrowExceptionOnScriptError(false);
        webClient.getOptions().setCssEnabled(true);     
        HtmlPage page = (HtmlPage) webClient.getPage("http://www.sears.com/search=little tikes&Little Tikes?filter=Brand&keywordSearch=false&vName=Toys+%26+Games&catalogId=12605&catPrediction=false&previousSort=ORIGINAL_SORT_ORDER&viewItems=50&storeId=10153&adCell=W3");
        WebResponse response = page.getWebResponse();
        String content = response.getContentAsString();
        System.out.println(page.getUrl());
Sign up to request clarification or add additional context in comments.

9 Comments

thanks ABP i will try this once I rule out HTMLUnit. But when you "This will open the url in firefox" will it open a window in an actual window or is it mimiced in the java program ? Thanks
It will open a Firefox browser instance.
how can this be done in a simulated fashion , meaning the java program simulates a web browser and not the actual browser ? Thanks.
If you do not want to open the Firefox and want to do the html open using the java program them you have to use HTMLUnitDriver. But HTMLUnitDriver have issues with Javascript, I have faced Javascript issues that I was not able to solve. Tried everything. Also please check the EDIT in my post. Added another code, might work for you.
Thanks! I am doing the exact samething , but using Forefox_17 instead, it works but its way too slow , so much so that I cannot even debug in eclipse . I need to parse many urls and need to scale , not sure what options I have though .
|
1

If you are using HTMLUnit Driver, then please enable JavaScript (it's set off by default) as shown below.

More over HTMLUnit uses Rhino as it's JavaScript engine which differs from other main stream browser JS engines.

HtmlUnitDriver Browser_Session= new HtmlUnitDriver();
Browser_Session.setJavascriptEnabled(true);

or

HtmlUnitDriver Browser_Session = new HtmlUnitDriver(true);

Below steps should fetch the redirected url.

Browser_Session.navigate().to("URL");
Browser_Session.getCurrentUrl(); //This fetches the current re-directed URL.

Hope this helps

1 Comment

Thanks! I was using HTMLUnit WebClient class with JavaScript enabled to true, apparently Rhino JS engine was throwing this exception , are you saying that if I use HTMLUnitDriver instead it will not cause this Javascript execution exception because it does not use Rhino engine ? Thanks.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.