0

I am trying to parse rss xml, but stuck in parsing the description, as my program stops parsing the description content when it encounter (').

Code to parse xml:

public class RSSAX {

String channel_title="";

public void displayRSS()
{

    try {

        SAXParserFactory spf =  SAXParserFactory.newInstance();
        SAXParser sp = spf.newSAXParser();
        sp.parse("http://www.ronkaplansbaseballbookshelf.com/feed/podcast/", new RSSHandler());


    } catch (Exception e) {
        // TODO: handle exception
        System.out.println("Messge is "+e.getMessage());
    }

}

private class RSSHandler extends DefaultHandler
{
    private boolean isItem = false;
    private String tagName=""; 

    @Override
    public void startElement(String uri, String localName, String qName,
            Attributes attributes) throws SAXException {
        this.tagName= qName;
        if(qName.equals("item"))
        {
            this.isItem=true;
        }

    }

    @Override
    public void endElement(String uri, String localName, String qName)
            throws SAXException {
         this.tagName="";
         if(qName.equals("item"))
         {
             System.out.println("========================");
             this.isItem=false;
         }


    }

    @Override
    public void characters(char[] ch, int start, int length)
            throws SAXException {

        if(this.isItem)
        {
            //System.out.println("tagname is "+this.tagName);
            if(this.tagName.equals("title"))
            {
                System.out.println("title is "+(new String(ch,start,length)));
                this.tagName="";
            }
            else if(this.tagName.equals("link"))
            {
                System.out.println("link is "+(new String(ch,start,length)));
                this.tagName="";
            }
            else if(this.tagName.equals("description"))
            {
                String test=(new String(ch,start,length)).replaceAll("\\<.*?>","");
                test=StringEscapeUtils.escapeXml(StringEscapeUtils.unescapeXml(test));
                System.out.println("description is "+test);
                this.tagName="";
            }
            else if(this.tagName.equals("comments"))
            {
                System.out.println("comment link is "+(new String(ch,start,length)));
                this.tagName="";
            }
            else if(this.tagName.equals("pubDate"))
            {
                System.out.println("pubDate is "+(new String(ch,start,length)));
                this.tagName="";
            }
            else if(this.tagName.equals("category"))
            {
                System.out.println("Category is "+(new String(ch,start,length)));
                this.tagName="";
            }
            else if(this.tagName.equals("content:encoded"))
            {
                System.out.println("content:encoded is "+(new String(ch,start,length)));
                //this.tagName="";
            }

        }

    }

}



Output:

title is The Bookshelf Conversation: Filip Bondy
link is http://www.ronkaplansbaseballbookshelf.com/2015/08/04/the-bookshelf-conversation-filip-bondy/
pubDate is Tue, 04 Aug 2015 14:31:45 +0000
comment link is http://www.ronkaplansbaseballbookshelf.com/2015/08/04/the-bookshelf-conversation-filip-bondy/#comments
Category is 2015 title Category is Author profile/interview by Ron Kaplan

description is My New Jersey landsman and veteran sportswriter Filip Bondy has crafted a fun volume on one of the most famous games in the history of the national pastime. Whenever there

It stops parsing the description when it encounters there's..

1
  • what is the exception? Commented Nov 17, 2015 at 5:35

2 Answers 2

1

A SAX parser can break up text nodes any way it likes, and deliver the content in multiple calls to the characters() method. It's your job to reassemble the pieces.

Sign up to request clarification or add additional context in comments.

2 Comments

can you help me by suggesting the needed update in code to perform the desired operation i.e to fetch the entire description.
Any tutorial on SAX will explain this information. I've written a few myself in various XML books. I'm not going to write another just for you.
0

You can use STAXParser, in this to force XMLStreamReader to return a single string, you can include:

factory.setProperty("javax.xml.stream.isCoalescing", true);

This helps to return as one string, refer XMLStreamReade.next() Documentation

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.