11

I use library rome.dev.java.net to fetch RSS.

Code is

URL feedUrl = new URL("http://planet.rubyonrails.ru/xml/rss");
SyndFeedInput input = new SyndFeedInput();
SyndFeed feed = input.build(new XmlReader(feedUrl));

You can check that http://planet.rubyonrails.ru/xml/rss is valid URL and the page is shown in browser.

But I get exception from my application

java.io.FileNotFoundException: http://planet.rubyonrails.ru/xml/rss
        at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1311)
        at com.sun.syndication.io.XmlReader.<init>(XmlReader.java:237)
        at com.sun.syndication.io.XmlReader.<init>(XmlReader.java:213)
        at rssdaemonapp.ValidatorThread.run(ValidatorThread.java:32)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:619)

I don't use any proxy. I get this exception on my PC and on the production server and only for this URL, other URLs are working.

3 Answers 3

8

The code that is throwing that exception looks like this ... assuming I've got the right version:

if (respCode >= 400) {
    if (respCode == 404 || respCode == 410) {
        throw new FileNotFoundException(url.toString());
    } else {
        throw new java.io.IOException(
            "Server returned HTTP"
            + " response code: " + respCode
            + " for URL: " + url.toString());
    }
}

In other words, when you are doing the GET from Java, you are getting a 404 or 410 response. Now when I do the request using the wget utility, I get a 200 response. So my guess is that the problem is one of the following:

  • You happened to make the request when they were suffering from some configuration problem.
  • They have implemented their server to return 404 / 410 for certain User-Agent strings.

Other possibilities are that they are doing some kind of server-side filtering on IP addresses or that there is some DNS problem that is causing your requests to go to a different IP address. But both of these seem to be contradicted by the fact that you can access the feed in your browser.

If this is the User-Agent, take a look at their terms of service to see if they have a banned certain kinds of use of their site / RSS feed.

Sign up to request clarification or add additional context in comments.

1 Comment

I tried to get page using apacha HttpClient and it works! See my answer.
4

I suspect it doesn't like Java. You need to fake your "User-Agent" header, not sure if it's doable with your RSS library.

Another suggestion is that you fetch the data yourself and feed the data to the feed reader.

Comments

4

I tried this code

HttpClient httpClient = new DefaultHttpClient();
HttpGet pageGet = new HttpGet(feedUrl.toURI());
HttpResponse response = httpClient.execute(pageGet);
SyndFeedInput input = new SyndFeedInput();
SyndFeed feed = input.build(new XmlReader(response.getEntity().getContent()));

It works! Thank for your suggestions. Looks like this is about user-agent.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.