0

When parsing a webpage, I get the link href=http://www.onvista.de/aktien/snapshot.html?ID_OSI=36714349 When issuing this link in my browser, it replaces it with "http://www.onvista.de/aktien/Adidas-Aktie-DE000A1EWWW0" and renders it correctly. But with java I fail to retrieve the page. I used the following sample which was suggested here to display redirected URLs.

import java.io.IOException;
import java.io.InputStream;
import java.net.MalformedURLException;
import java.net.URL;
import java.net.URLConnection;

public class GetRedirected {

    public GetRedirected() throws MalformedURLException, IOException {
        String url="http://www.onvista.de/aktien/snapshot.html?ID_OSI=36714349";
        URLConnection con = new URL( url ).openConnection();
        System.out.println( "orignal url: " + con.getURL() );
        con.connect();
        System.out.println( "connected url: " + con.getURL() );
        InputStream is = con.getInputStream();
        System.out.println( "redirected url: " + con.getURL() );
        is.close();
    }

    public static void main(String[] args) throws Exception {
        new GetRedirected();
    }
}

But it fails at the "InputStream is ="-statement with the attached error message. How may I solve this. Any idea is welcome.

orignal url: www.onvista.de/aktien/snapshot.html?ID_OSI=36714349

connected url: www.onvista.de/aktien/snapshot.html?ID_OSI=36714349

Exception in thread "main" java.io.IOException: Server returned HTTP

response code: 403 for URL: www.onvista.de/aktien/snapshot.html?ID_OSI=36714349

at sun.net.www.protocol.http.HttpURLConnection.getInputStream(Unknown Source)

at de.gombers.broker....

2 Answers 2

1
you can get retrieve it by this code
package Test;

import java.io.BufferedReader;
import java.io.InputStreamReader;
import java.net.HttpURLConnection;
import java.net.URL;

public class HttpRedirectExample {

  public static void main(String[] args) {

    try {

    String url = "http://www.onvista.de/aktien/snapshot.html?ID_OSI=36714349";
//  String urlTest="https://api.twitter.com/oauth/authenticate";

URL obj = new URL(url);
    HttpURLConnection conn = (HttpURLConnection) obj.openConnection();
    conn.setReadTimeout(5000);
    conn.addRequestProperty("Accept-Language", "en-US,en;q=0.8");
    conn.addRequestProperty("User-Agent", "Mozilla");
    conn.addRequestProperty("Referer", "google.com");

    System.out.println("Request URL ... " + url);

    boolean redirect = false;


    int status = conn.getResponseCode();
    if (status != HttpURLConnection.HTTP_OK) {
        if (status == HttpURLConnection.HTTP_MOVED_TEMP
            || status == HttpURLConnection.HTTP_MOVED_PERM
                || status == HttpURLConnection.HTTP_SEE_OTHER)
        redirect = true;
    }

    System.out.println("Response Code ... " + status);

    if (redirect) {

        // get redirect url from "location" header field
        String newUrl = conn.getHeaderField("Location");

        // get the cookie if need, for login
        String cookies = conn.getHeaderField("Set-Cookie");

        // open the new connnection again
        conn = (HttpURLConnection) new URL(newUrl).openConnection();
        conn.setRequestProperty("Cookie", cookies);
        conn.addRequestProperty("Accept-Language", "en-US,en;q=0.8");
        conn.addRequestProperty("User-Agent", "Mozilla");
        conn.addRequestProperty("Referer", "google.com");

        System.out.println("Redirect to URL : " + newUrl);

    }

    BufferedReader in = new BufferedReader(
                              new InputStreamReader(conn.getInputStream()));
    String inputLine;
    StringBuffer html = new StringBuffer();

    while ((inputLine = in.readLine()) != null) {
        html.append(inputLine);
    }
    in.close();

    System.out.println("URL Content... \n" + html.toString());
    System.out.println("Done");

    } catch (Exception e) {
    e.printStackTrace();
    }

  }

}
Sign up to request clarification or add additional context in comments.

Comments

0

Very common mistake: When the HTTP status code of a response of HttpURLConnection indicates an error (AFAIK >= 400), accessing getInputStream() throws an exception. You have to check getResponseCode() and then decide if you have to call getInputStream() or getErrorStream(). So instead of calling getInputStream(), you should first call getResponseCode().

But actually I cannot reproduce your error, for me it's working (though I use a tiny abstraction library called DavidWebb:

public void testAktienAdidas() throws Exception {

    Webb webb = Webb.create();
    Response<String> response = webb
            .get("http://www.onvista.de/aktien/snapshot.html?ID_OSI=36714349")
            .asString();

    assertEquals(200, response.getStatusCode());
    assertNotNull(response.getBody());
    assertTrue(response.getBody().contains("<!DOCTYPE html>"));
}

I don't get a redirect, probably this is done client-side via JavaScript or there is some server-side logic which evaluates HTTP-headers like User-Agent.

But if you experience redirects, you can tell HttpURLConnection to automatically follow them:

conn.setInstanceFollowRedirects(true);

1 Comment

You are right. Before initiating an action it should have been tested whether the environment allows it. But the displayed code has been put together to show the problem. But anyway, the problem is solved now. The websever had to be called with an approproiate User-Agent. Specifying "Mozilla/5.0 (Windows NT 6.3; WOW64; rv:26.0) Gecko/20100101 Firefox/26.0" for this purpose did the trick.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.