0

I have a string with multiple "message" inside it. "message" starts with certain char sequence. I've tried:

String str = 'ab message1ab message2ab message3'
Pattern pattern = Pattern.compile('(?<record>ab\\p{ASCII}+(?!ab))');
Matcher matcher = pattern.matcher(str);
while (matcher.find()) {
    handleMessage(matcher.group('record'))
}

but \p{ASCII}+ greedy eat everything. Symbols a, b can be inside message only their sequence mean start of next message

7
  • You may try splitting string on each "ab " or whatever char sequence you have in front of it. Commented Feb 19, 2018 at 12:25
  • Try String[] res =str.split("(?!^)\\s*(?=ab)");. If the ab is always at the end of the word, add \\b after ab in the pattern (=> "(?!^)\\s*(?=ab\\b)"). Commented Feb 19, 2018 at 12:26
  • How do you know if that charsequence is not part of the message ? Like "I am absent" ? Commented Feb 19, 2018 at 12:29
  • replace \p{ASCII}+ with [^ ab]+ Commented Feb 19, 2018 at 12:31
  • 1
    @TaherKhorshidi bad idea: this would forbid any a or b in the message. Commented Feb 19, 2018 at 12:33

1 Answer 1

2

p{ASCII}+ is the greedy regex for one or more ASCII characters, meaning that it will use the longest possible match. But you can use the reluctant quantifier if you want the shortest possible match: p{ASCII}+?. In that case, you should use a positive lookahead assertion.

The regex could become:

Pattern pattern = Pattern.compile("(?<record>ab\\p{ASCII}+?)(?=(ab)|\\z)");

Please note the (ab)|\z to match the last message...

Sign up to request clarification or add additional context in comments.

1 Comment

Thank you. Works perfect

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.