1

VIA JAVA, I want to login to a website.

Authentication: The site has a javascript button that performs the redirection to the home page. My webcrawler can login programatically to sites that have html buttons, using Jsoup. But when I try to login to a website that has the submit in a javascript, I can't seem to get authenticated in any of the ways I discovered so far.

So far I've tried:

  • I've tried to log in using the native java api, using URLConnection, and OutputWriter. It fills the user and password fields with their proper values, but when I try to execute the javascript method, it simply doesn't work;
  • Jsoup. (It can log me in to any website containing html buttons. But since it doesn't support javascript, it won't help much;
  • I've tried HtmlUnit. Not only does it print a gazilion lines of output, it takes a long long time to run, and in the end still fails.
  • At last, I tried using Rhino (Which HtmlUnit is based on), got it to work in a long list of javascript methods. But cannot authenticate;
  • I already have tried Selenium, and got nowhere, also..

I'm running out of ideas.. Maybe I haven't explored all the solutions contained in one of these APIs, but I still can't login to a website containing a javascript button. Anyone has any ideas?

8
  • 2
    try to be in our shoe and read the question one more time. you are not saying much. Commented May 17, 2012 at 18:11
  • I don't know anything about Jsoup or Rhino or Selenium Webdriver, but if you can submit a form when there's a button, could you just submit the form directly? Like, in Javascript, instead of document.getElementById('btnSubmit').click() use document.forms.myForm.submit()? Commented May 17, 2012 at 18:24
  • I was just going to suggest something similar to Travesty3, but using Nodejs and Zombie - are you locked in to Java? Commented May 17, 2012 at 18:34
  • The basic issue is that web sites are implemented to be interpreted by web browsers. Therefore, in general, to be able to handle automated operation of arbitrary web pages, you have to emulate an actual web browser and provide the facilities that pages expect to be available. That's why a lot of people doing things like your project use WebKit as a component. How to do something like that from Java, I don't know, but no popular modern browsers are implemented in Java so that's an issue right from the start. Commented May 17, 2012 at 18:37
  • 1
    If all else fails you can watch a successful login with WireShark and unsuccessful ones with the various packages and reverse engineer a solution from what you learn. It sounds like Rhino is closest, so I'd start there. FWIW I agree with what's been said that anything short of WebKit or something else (like a browser) with a full JS implementation in the loop will be a kludge. Commented Jun 15, 2012 at 18:44

3 Answers 3

3
+25

Using Selenium Webdriver, send javascript commands to the browser. I've successfully used it to reliably and repeatedly run hundreds of tests on complicated javascript/ajax procedures on the client.

If you target a specific web page, you can customize the script and make it quite small.

WebDriver driver; // Assigned elsewhere
JavascriptExecutor js = (JavascriptExecutor) driver;

// This is javascript, but can be done through Webdriver directly
js.executeScript("document.getElementById('theform').submit();");

Filling out the form is assumed to have been handled by using the Selenium Webdriver API. You can also send commands to click() the right button etcetera.

Using Selenium Webdriver, you could also write <script> tags to the browser, in order to load larger libraries. Remember that you may have to wait/sleep until the browser has loaded the script files - both your own and the one the original web page uses for the login procedures. It could be seconds to load and execute all of it. To avoid sleeping for too long, use the more reliable method of injecting a small script that will check if everything else has been loaded (checking web page script's status flags, browser status).

Sign up to request clarification or add additional context in comments.

1 Comment

@IgorBrusamolinLoboSantos: what browser did you use Selenium Webdriver with? I've had the most success with ChromeDriver, but Chrome, Firefox and IE all have their own Selenium Webdriver specific quirks.
1

I suggest HtmlUnit:

HtmlUnit is a "GUI-Less browser for Java programs". It models HTML documents and provides an API that allows you to invoke pages, fill out forms, click links, etc... just like you do in your "normal" browser.

It has fairly good JavaScript support (which is constantly improving) and is able to work even with quite complex AJAX libraries, simulating either Firefox or Internet Explorer depending on the configuration you want to use.

It is typically used for testing purposes or to retrieve information from web sites.

Comments

0

I had an issue that sounds similar (I had a login button that called a javascript method).

I used JMeter to observe what was being passed when I manually clicked the login button through a web browser (but I imagine you could do this with WireShark as well).

In my Java code, I created a PostMethod with all the parameters that were being sent.

PostMethod post = new PostMethod(WEB_URL); // URL of the login page
// first is the name of the field on the login page,
// then the value being submitted for that field
post.addParameter(FIELD_USERNAME, "username");
post.addParameter(FIELD_PASSWORD, "password");

I then used HttpClient (org.apache.commons.httpclient.HttpClient) to execute the Post request.

One thing to note, there were 'hidden' parameters that were being passed that I did not see by manually looking at the login page. These were revealed to me when I used JMeter.

I will be happy to clarify anything that seems unclear.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.