HtmlUnit to get page source code implementation shows Exception

Question

I try to get a dynamic page from URL. I am working in Java. I have done this using Selenium, but it takes lots of time. As it takes time to invoke driver of Selenium. That's why I shifted to HtmlUnit, as it is GUILess Browser. But my HtmlUnit implementation shows some exception.

Question :-

How can I correct my HtmlUnit implementation.
Is the page produced by Selenium is simiar to the page produced by HtmlUnit? [ Both are dynamic or not? ]

My selenium code is :-

 public static void main(String[] args) throws IOException {

 // Selenium
 WebDriver driver = new FirefoxDriver();
 driver.get("ANY URL HERE");  
 String html_content = driver.getPageSource();
 driver.close();

 // Jsoup makes DOM here by parsing HTML content
 Document doc = Jsoup.parse(html_content);

 // OPERATIONS USING DOM TREE

}

HtmlUnit code:-

package XXX.YYY.ZZZ.Template_Matching;

import com.gargoylesoftware.htmlunit.WebClient;
import com.gargoylesoftware.htmlunit.html.HtmlPage;
import org.junit.Assert;
import org.junit.Test;

public class HtmlUnit {

    public static void main(String[] args) throws Exception {
        //HtmlUnit htmlUnit = new  HtmlUnit();
        //htmlUnit.homePage();
        WebClient webClient = new WebClient();
        HtmlPage currentPage = webClient.getPage("http://www.jabong.com/women/clothing/womens-tops/?source=women-leftnav");
        String textSource = currentPage.asText();
        System.out.println(textSource);
    }
}

It shows exception :-

enter image description here

Stephen C · Accepted Answer · 2013-04-06 13:30:43Z

1

1: How can I correct my HtmlUnit implaementation.

Looking at the stack trace, it seems to be saying that the javascript engine executed some javascript that tried to access an attribute on a Javascript "undefined" value. If it is correct, that would be a bug in the javascript you are testing, not in the HtmlUnit code.

2: Is the page produced by Selenium is simiar to the page produced by HtmlUnit?

That does not make sense. Neither Selenium or HtmlUnit "produces" a page. The page is produced by the serve code you are testing.

If you are asking if HtmlUnit is capable of dealing with code that has embedded Javascript ... there is clear evidence in the stacktrace that it is trying to execute the Javascript.

answered Apr 6, 2013 at 13:30

Stephen C

723k95 gold badges849 silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

devsda Over a year ago

My Second question is clear that, HtmlUnit able to get dynamic source code of any URL. But, in my first question, how can I resolve this? As my task is to get the dynamic page in String of any URL in web. Please help me to do that. Tell me any other method that does the same.

devsda Over a year ago

Please help me from this problem.

Stephen C Over a year ago

Like I said. I think that the stacktrace is telling you there is a bug in the web page that your server is delivering. You resolve it by finding out WHY the javascript is trying to get an attribute from 'undefined'.

Collectives™ on Stack Overflow

HtmlUnit to get page source code implementation shows Exception

1 Answer 1

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related