I have some text that occurs in a particular format as shown below Each line starts with a + followed by a space and some text It then has a bunch of lines stuck together that start with a minus sign or @ or % or * and space and some text following it. I would like to capture each block separately from below using Regular expressions.
+ you rock
- I rock and rule.
+ you rule
- I rock and rule.
- That is a perfect artificial entity.
+ you made a mistake
- That is impossible. I never make mistakes.
- I guess so, something must have gone wrong.
Output
Block 1 + you rock - I rock and rule.
Block 2 + you rule - I rock and rule. - That is a perfect artificial entity.
This is my current regular expression
(^\+.*$)(?:\r?\n)(?:(^[-%@\*].*$)(?:\r?\n)?)+
In the above expression, Group 1 = (^+.$) that captures the statement following a +, group 2 = (^[-%@*].$) that captures the second part but notice that there may be more than one statement which has a - at the start.
When I run a while loop in Java code
Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
Matcher matcher = pattern.matcher(contents);
while (matcher.find()) {
// This gives me the item following +
System.out.println(matcher.group(1));
// This ONLY gives me the last item following -, how do I get all
System.out.println(matcher.group(2));
}
How do I get all the statements that have a minus sign in front of them as an array?
splitor a loop. But I want to question your premise: Why is it important to solve this with a regular expression? What's wrong with other types of solutions? Are they somehow uncool? I've been trying to understand this for years--why so many programmers seem to assume that they need to find the magical regex to solve something. Second question: Do you think that code like(^\+.*$)(?:\r?\n)(?:(^[-%@\*].*$)(?:\r?\n)?)+makes a program more or less readable?\Rwhich represents line separators (including\r\nso you can use it instead of\r?\n). Anyway, why do you want to have these parts in array? What are you trying to achieve? Maybe reading this text line-by-line, while checking if you found empty line would be easier?charAt()orstartsWith()to see if the first character is+or-?