1

I have some text that occurs in a particular format as shown below Each line starts with a + followed by a space and some text It then has a bunch of lines stuck together that start with a minus sign or @ or % or * and space and some text following it. I would like to capture each block separately from below using Regular expressions.

+ you rock
- I rock and rule.

+ you rule
- I rock and rule.
- That is a perfect artificial entity.

+ you made a mistake
- That is impossible. I never make mistakes.
- I guess so, something must have gone wrong.

Output

Block 1 + you rock - I rock and rule.

Block 2 + you rule - I rock and rule. - That is a perfect artificial entity.

This is my current regular expression

(^\+.*$)(?:\r?\n)(?:(^[-%@\*].*$)(?:\r?\n)?)+

In the above expression, Group 1 = (^+.$) that captures the statement following a +, group 2 = (^[-%@*].$) that captures the second part but notice that there may be more than one statement which has a - at the start.

When I run a while loop in Java code

Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
Matcher matcher = pattern.matcher(contents);
while (matcher.find()) {
    // This gives me the item following +
    System.out.println(matcher.group(1));
    // This ONLY gives me the last item following -, how do I get all
    System.out.println(matcher.group(2));
}

How do I get all the statements that have a minus sign in front of them as an array?

7
  • 2
    Basically, you can't do it. You'll need split or a loop. But I want to question your premise: Why is it important to solve this with a regular expression? What's wrong with other types of solutions? Are they somehow uncool? I've been trying to understand this for years--why so many programmers seem to assume that they need to find the magical regex to solve something. Second question: Do you think that code like (^\+.*$)(?:\r?\n)(?:(^[-%@\*].*$)(?:\r?\n)?)+ makes a program more or less readable? Commented Sep 5, 2016 at 16:53
  • nice point but honestly i am not aware of what other solutions may work for this problem, i have a file with around 300000 lines of text in this format and would like to parse them to blocks and extract and process information from them Commented Sep 5, 2016 at 16:57
  • btw how can I split or loop the second part which has a - sign Commented Sep 5, 2016 at 16:57
  • Hint: since Java 8 we have \R which represents line separators (including \r\n so you can use it instead of \r?\n). Anyway, why do you want to have these parts in array? What are you trying to achieve? Maybe reading this text line-by-line, while checking if you found empty line would be easier? Commented Sep 5, 2016 at 16:58
  • "Not aware of other solutions"? Can't you just read one line at a time and use charAt() or startsWith() to see if the first character is + or -? Commented Sep 5, 2016 at 17:00

1 Answer 1

2

Using this regexp ^\+[^+]* with m and g modifiers gives you needed result
https://regex101.com/r/bH1aQ9/1

On your test data result will be 3 groups start with + character.

The solution idea is to treat all you lines like one big line and split it on groups is started with + and haven't + inside them.

Update

To take into account @Pshemo note about possible + characters inside lines

Regexp ^\+.*?(^$|\Z) with with s, m and g modifiers
https://regex101.com/r/bH1aQ9/1

Sign up to request clarification or add additional context in comments.

10 Comments

@PirateApp [^+]* means 0 or more characters in brackets (it defines * postfix) and ^+ inside the brackets means any character but not (^) plus.
@PirateApp thx! But be careful with regexps regex.info/blog/2006-09-15/247
@PirateApp Just be aware that this solution will not accept any + inside your group (except first one). So for instance + I like + operator will stop at + I like .
@Pshemo you are right, but in source data there was not such corner case. I would update the task conditions with this note.
@Pshemo added the second solution bases on your note
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.