0

This is the code I used :

class ResponseCodeCheck 
{

public static void main (String args[]) throws Exception
{

    URL url = new URL("http://www.amazon.co.jp/gp/seller/sell-your-stuff.html");
    HttpURLConnection connection = (HttpURLConnection)url.openConnection();
    connection.setRequestMethod("GET");
    connection.connect();

    int code = connection.getResponseCode();
    System.out.println("Response code of the object is "+code);
    if (code==200)
    {
        System.out.println("OK");
    }
}
}

And it gave 404 for the URL while that URL is working fine. Any reason why ?

3
  • what does that mean? If i change url to 'google.com', above code works fine. Commented Jul 3, 2012 at 10:43
  • You say the URL is working fine. Did you check in your browser? Does your browser access the internet through a proxy? Commented Jul 3, 2012 at 10:57
  • @PhilippReichart lol!!! sorry i've been sick the last few days and came back to work today, so, i'm a bit off. Sorry! Commented Jul 3, 2012 at 11:38

2 Answers 2

2

Add a proper header value for "User-Agent"

connection.addRequestProperty("User-Agent", "Safari");
Sign up to request clarification or add additional context in comments.

5 Comments

Interestingly, Amazon.co.jp seems to 404 any UA containing curl (propbably to prevent scraping). Even a bogus UA like foo works.
yeah , setting UA gives 200 , but any reason why this happens ?
This most likely happens to prevent programs/spiders from downloading all catalogue data off Amazon.co.jp (which most likely is a violation of their TOS anyway).
Still i don't understand why changing 'http' to 'https' gives 301
My suggestion is that Amazon simple has some filter for checking UA value.
0

CURL is saying:

curl -v http://www.amazon.co.jp/gp/seller/sell-your-stuff.html
* About to connect() to www.amazon.co.jp port 80 (#0)
*   Trying 176.32.120.128... connected
> GET /gp/seller/sell-your-stuff.html HTTP/1.1
> User-Agent: curl/7.23.1 (x86_64-pc-win32) libcurl/7.23.1 OpenSSL/0.9.8r zlib/1.2.5
> Host: www.amazon.co.jp
> Accept: */*
>
< HTTP/1.1 301 MovedPermanently

Please note HTTP/1.1 301 MovedPermanently. Are you sure you have received 404 and not 301? This is usual web practice, 301 header means that content was placed in some other location and user (browser) should navigate to it.

Also please make sure that HttpURLConnection allows redirection.

2 Comments

With -H "User-Agent: foo" you get the actual page content. A 403 Forbidden with a "please don't crawl us" message would have been a lot nicer on Amazon's part, though :/

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.