3

I have a piece of content which has both html and rss, I would like to separate them and store in individual strings. So, I am trying to parse them based on their start and close tags and grab content between rss /rss .

Code works fine for html & /html. However I am seeing errors for rss & /rss.

Below is my code snippet.

// parse the responseStr to html
html = responseStr.substring(responseStr.indexOf("<html>"),
responseStr.lastIndexOf("</html>") + 7);
System.out.println("html string"+html );

Can someone please guide me what is wrong with the below code?

// parse the responseStr to rss
rss = responseStr.substring(responseStr.indexOf("<rss version="2.0">"),
responseStr.lastIndexOf("</rss>") + 6);
System.out.println("rss string = "+rss );

I get the below exception:

  java.lang.StringIndexOutOfBoundsException
    at java.lang.String.substring(String.java:1093)
9
  • 1
    What do you mean by I am seeing errors - Also, can you post the text you're trying to parse? Commented Aug 26, 2013 at 18:13
  • why not use a library? an xml parser at the very least would allow you to use xpath Commented Aug 26, 2013 at 18:13
  • What errors do you see? Add them in the question please Commented Aug 26, 2013 at 18:14
  • The above code works for me if your input string is <rss> ... </rss>. Please post your input string. Commented Aug 26, 2013 at 18:18
  • chances are responseStr.lastIndexOf("</rss>") + 6 doesn't exist Commented Aug 26, 2013 at 18:20

3 Answers 3

4

It is likely that your call to substring is being passed invalid indexes for your responseStr. You need to verify that your string actually contains the <rss> and </rss> tags before you call substring.

Try this:

String result;
int start = responseStr.indexOf("<rss>");
int end = responseStr.lastIndexOf("</rss>");

if (start != -1 && end != -1)
{
  result = "rss string = " + responseStr.substring(start, end + 6);
}
else
{
  result = "rss string not found";
}

System.out.println(result);

From the JavaDocs for String.indexOf, we know that if the string does not occur, -1 will be returned.

Sign up to request clarification or add additional context in comments.

7 Comments

I do have rss string, however, when I use your code it prints rss string not found.
@smiley If "rss string not found" is being printed, then one or both of the rss tags are missing. You need to inspect your string. You can also alter the above code to tell you which of the tags (opening or closing) is missing specifically.
ah.. I just noticed I am getting rss tag as <rss version="2.0"> and not as <rss> as someone mentioned above. How do I specify it in the code? I think I might need escape characters since I am not able to use version="2.0" directly.
Thank you very much Luke! I tried this, I know its pretty crude -- responseStr.indexOf("<rss version=") this worked for me though :) Thanks for all your help!
If you're going to just use indexOf, I would recommend using "<rss" as your search string, rather than "<rss version=".
|
3

I think it would be easier by using

StringUtils.substringsBetween(String str,String open,String close)

javadoc

apache commons

Example:

String[] rss= StringUtils.substringsBetween(testHtml, "<rss>", "</rss>");
    for (String s : rss) {
        System.out.println("td rss:" + rss); 
}

public static String substringBetween(String str, String open, String close) {
    if (str == null || open == null || close == null) {
        return null;
    }
    int start = str.indexOf(open);
    if (start != INDEX_NOT_FOUND) {
        int end = str.indexOf(close, start + open.length());
        if (end != INDEX_NOT_FOUND) {
            return str.substring(start + open.length(), end);
        }
    }
    return null;
}

Comments

2

I would recommend xml parser though instead of below code

public static void main(String[] args) {
    String responseStr = "<rss ...>------content-----</rss>";
    int start = responseStr.indexOf("<rss");
    String content = null;
    if (start != -1) {
        start = responseStr.indexOf(">", start);
        if (start != -1) {
            int end = responseStr.lastIndexOf("</rss>");
            if (end != -1) {
                content = responseStr.substring(start + 1, end);
            }
        }
    }
    if (content != null)
        System.out.println(content);
    else
        System.err.println("Content not found");

}

Output

------content-----

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.