java.io.FileNotFoundException for valid URL

Question

I use library rome.dev.java.net to fetch RSS.

Code is

URL feedUrl = new URL("http://planet.rubyonrails.ru/xml/rss");
SyndFeedInput input = new SyndFeedInput();
SyndFeed feed = input.build(new XmlReader(feedUrl));

You can check that http://planet.rubyonrails.ru/xml/rss is valid URL and the page is shown in browser.

But I get exception from my application

java.io.FileNotFoundException: http://planet.rubyonrails.ru/xml/rss
        at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1311)
        at com.sun.syndication.io.XmlReader.<init>(XmlReader.java:237)
        at com.sun.syndication.io.XmlReader.<init>(XmlReader.java:213)
        at rssdaemonapp.ValidatorThread.run(ValidatorThread.java:32)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:619)

I don't use any proxy. I get this exception on my PC and on the production server and only for this URL, other URLs are working.

Stephen C · Accepted Answer · 2010-05-08 13:17:50Z

8

The code that is throwing that exception looks like this ... assuming I've got the right version:

if (respCode >= 400) {
    if (respCode == 404 || respCode == 410) {
        throw new FileNotFoundException(url.toString());
    } else {
        throw new java.io.IOException(
            "Server returned HTTP"
            + " response code: " + respCode
            + " for URL: " + url.toString());
    }
}

In other words, when you are doing the GET from Java, you are getting a 404 or 410 response. Now when I do the request using the wget utility, I get a 200 response. So my guess is that the problem is one of the following:

You happened to make the request when they were suffering from some configuration problem.
They have implemented their server to return 404 / 410 for certain User-Agent strings.

Other possibilities are that they are doing some kind of server-side filtering on IP addresses or that there is some DNS problem that is causing your requests to go to a different IP address. But both of these seem to be contradicted by the fact that you can access the feed in your browser.

If this is the User-Agent, take a look at their terms of service to see if they have a banned certain kinds of use of their site / RSS feed.

answered May 8, 2010 at 13:17

Stephen C

723k95 gold badges849 silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Alexei Over a year ago

I tried to get page using apacha HttpClient and it works! See my answer.

ZZ Coder · Accepted Answer · 2010-05-08 13:03:31Z

4

I suspect it doesn't like Java. You need to fake your "User-Agent" header, not sure if it's doable with your RSS library.

Another suggestion is that you fetch the data yourself and feed the data to the feed reader.

answered May 8, 2010 at 13:03

ZZ Coder

75.7k30 gold badges139 silver badges169 bronze badges

Comments

Alexei · Accepted Answer · 2010-05-08 13:29:41Z

4

I tried this code

HttpClient httpClient = new DefaultHttpClient();
HttpGet pageGet = new HttpGet(feedUrl.toURI());
HttpResponse response = httpClient.execute(pageGet);
SyndFeedInput input = new SyndFeedInput();
SyndFeed feed = input.build(new XmlReader(response.getEntity().getContent()));

It works! Thank for your suggestions. Looks like this is about user-agent.

answered May 8, 2010 at 13:29

Alexei

2411 gold badge2 silver badges11 bronze badges

Collectives™ on Stack Overflow

java.io.FileNotFoundException for valid URL

3 Answers 3

1 Comment

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

1 Comment

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related