0

I am taking in a string from a website that looks along the lines of <HTML CODE HERE>Text I want to get and remove the brackets and the text within them, however, my end result is always null.

What I am trying is,

try {
        String desc = null;
        StringBuilder sb = new StringBuilder();
        BufferedReader r = new BufferedReader(new InputStreamReader(in));
        String line = null;
        boolean codeBlock;
        codeBlock = false;

        line = "<HTMLCODEHERE>Text I want to get";
        System.out.println("!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! STARTING DESC: " + line);

        while((line = r.readLine()) != null) {
            if((line = r.readLine()) == "<") {
                codeBlock = true;
            }
            if((line = r.readLine()) == ">") {
                codeBlock = false;
            }
            if(!codeBlock) {
                sb.append(line);
                desc = sb.toString();
            }
        }

        System.out.println("!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! ENDING DESC: " + desc);
        holder.txtContent.setText(desc);
    } catch (IOException e) {
        e.printStackTrace();
    }
8
  • 1
    Did you run the code with a debugger? That's your best option because we can't know what kind of input the reader is receiving. Commented Dec 15, 2017 at 21:11
  • Could you share an example of your input ? Commented Dec 15, 2017 at 21:11
  • Can you share sample input? Also you should check out regular expressions they will help you considerably in this scenario... Commented Dec 15, 2017 at 21:12
  • Carefully validate your if statements - one line contains just one char of type < or > ? Commented Dec 15, 2017 at 21:14
  • 1
    Possible duplicate of How do I compare strings in Java? Commented Dec 15, 2017 at 21:14

1 Answer 1

1

Have a look at the Java API for BufferedReader, namely readline:

Reads a line of text. A line is considered to be terminated by any one of a line feed ('\n'), a carriage return ('\r'), or a carriage return followed immediately by a linefeed.

https://docs.oracle.com/javase/7/docs/api/java/io/BufferedReader.html#readLine()

Therefore your code here:

if((line = r.readLine()) == "<") {
    codeBlock = true;
}
if((line = r.readLine()) == ">") {
    codeBlock = false;
}

Will never be true. Those calls also take you away from your current line of analysis.

If I understand your question correctly, you want all text in between any HTML tag? You could mess around with libraries like jsoup or go for a simpler implementation:

String parse = "<HTMLCODE>My favourite pasta is spaghetti, followed by ravioli</HTMLCODE>";

final char TAG_START = '<';
final char TAG_END = '>';

StringBuilder sb = new StringBuilder();

char[] parseChars = parse.toCharArray();

boolean inTag = true;
for (int i = 0; i< parseChars.length; i++) {
    if (parseChars[i] == TAG_START) {
        inTag = true;
        continue;
    }
    else if (parseChars[i] == TAG_END) {
        inTag = false;
        continue;
    }
    if (!inTag) {
        sb.append(parseChars[i]);
    }
}

System.out.println(sb.toString());
Sign up to request clarification or add additional context in comments.

4 Comments

That is for sure something I will look into, but my issue is that my string is "<HTMLCODE>code i do not need><HTMLCODE/>Text I want to keep". My comment in the above thread has an example of the String I am trying to parse.
"<img width="534" height="462" src="hillsdalewatch.com/wp-content/uploads/2017/12/…" class="webfeedsFeaturedVisual wp-post-image" alt="">With the first snow showers of the season flying Thursday morning"
From the HTML you supplied above, my example will return: With the first snow showers of the season flying Thursday morning" Is that not the behaviour you're looking for?
That is exactly what I am looking for! I apologize, I was a bit confused when I first read your code. Thank you so much!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.