21

There is a web page from which I want to retrieve a certain string. In order to do so, I need to login, click some buttons, fill a text box, click another button - and then the string appears.

How can I write a java program to do that automatically? Are there any useful libraries for that purpose?

Thanks

2
  • Usually screen scraping works less well than using official API's. What site are you trying to access? Commented Aug 23, 2010 at 17:17
  • I don't believe this site has an official API, but I'll check that option also. Commented Aug 23, 2010 at 17:23

5 Answers 5

28

Try HtmlUnit

HtmlUnit is a "GUI-Less browser for Java programs". It models HTML documents and provides an API that allows you to invoke pages, fill out forms, click links, etc... just like you do in your "normal" browser.

Example code for submiting form:

@Test
public void submittingForm() throws Exception {
    final WebClient webClient = new WebClient();

    // Get the first page
    final HtmlPage page1 = webClient.getPage("http://some_url");

    // Get the form that we are dealing with and within that form, 
    // find the submit button and the field that we want to change.
    final HtmlForm form = page1.getFormByName("myform");

    final HtmlSubmitInput button = form.getInputByName("submitbutton");
    final HtmlTextInput textField = form.getInputByName("userid");

    // Change the value of the text field
    textField.setValueAttribute("root");

    // Now submit the form by clicking the button and get back the second page.
    final HtmlPage page2 = button.click();

    webClient.closeAllWindows();
}

For more details check: http://htmlunit.sourceforge.net/gettingStarted.html

Sign up to request clarification or add additional context in comments.

4 Comments

This sound exactly what I've been looking for. I'll check it up, thanks!
It's also rather slow and really liberal with warning messages.
AWESOME! With this approach I was able to make a Java application which accesses the web site of my bank company, logs in with my credentials and manages by to output to the Java console my bank account balance and movements, all of that in a complete automatic way!
Voting down one tick. I am looking for something similar - BUT - please do not tell me about some unavailable "framework". How is it done with POJO ?
2

The super simple way to do this is using HtmlUnit here:

http://htmlunit.sourceforge.net/

and what you want to do can be as simple as:

@Test
public void homePage() throws Exception {
    final WebClient webClient = new WebClient();
    final HtmlPage page = webClient.getPage("http://htmlunit.sourceforge.net");
    assertEquals("HtmlUnit - Welcome to HtmlUnit", page.getTitleText());
}

Comments

1

Take a look at the apache HttpClient project, or if you need to run Javascript on the page, try HttpUnit.

Comments

0

Well when you press a button usually you do a request via a HTTP POST method, so you should use HttpClient to handle request and HtmlParser to handle the response page with the string you need.

Comments

0

Yes:

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.