Java: error while parsing a RSS feed

Question

Here below you can see the code.

public static void main(String[] args) throws Exception {
        DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
        factory.setValidating(true);
        factory.setIgnoringElementContentWhitespace(true);
        DocumentBuilder builder = factory.newDocumentBuilder();

        Document doc = builder.parse("http://rss.adnkronos.com/RSS_Politica.xml");

        NodeList nodes = doc.getElementsByTagName("title");

        for(int k=0; k < nodes.getLength(); k++) {
            System.out.print(nodes.item(k));
        }

    }

The link of the RSS feed is the following: http://rss.adnkronos.com/RSS_Politica.xml

The result (in the console) is the following:

null null null null null null null null null null null null null null null null null null null null null

The value of nodes title, as you can see in the xml, is not null obviously.

After the result, the following errors are shown (translated from italian).

Error: URI=http://rss.adnkronos.com/RSS_Politica.xml Line=1: The root element "rss" must match the root DOCTYPE "null".

Error: URI=http://rss.adnkronos.com/RSS_Politica.xml Line=1: Document is invalid: no grammar found.

Scott McMaster · Accepted Answer · 2018-04-18 17:48:19Z

There are two problems. Let's take care of the one you probably care most about first.

The nodes in your NodeList are Element nodes. The actual Text nodes are their children. So to get the values you want, you can do:

nodes.item(k).getFirstChild().getNodeValue()

Or (in this case):

nodes.item(k).getTextContent()

Personally I think the former is slightly more robust when doing general parsing because getTextContent() will concatenate all the text content from all the child nodes if there just happened to be more than one.

As for the validation errors, by default when you do setValidating(true), it's looking for an embedded DTD, which is not there, and it's complaining to you about it. The tl;dr is to setValidating(false).

If you really want to validate the RSS, you should try to find an unofficial (because there is no official one) XSD schema file and set that up in your DocumentBuilderFactory. Using an XSD for RSS in this context is probably not worthwhile, though, because half the RSS on the Internet, while perfectly usable, would probably fail validation :).

mavriksc · Accepted Answer · 2018-04-18 17:18:10Z

1

Look into validation options for the errors you are getting. As far as the null's for title it seems the toString on Node just returns null or does something that is just getting null. if you update it to System.out.print(nodes.item(k).getTextContent()); it will print out the titles.

answered Apr 18, 2018 at 17:18

mavriksc

1,1321 gold badge7 silver badges10 bronze badges

Collectives™ on Stack Overflow

Java: error while parsing a RSS feed

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related