4

I have a text something like

ab1ab2ab3ab4cd

Can one create a java regular expression to obtain all subtrings that start with "ab" and end with "cd"? e.g:

ab1ab2ab3ab4cd
ab2ab3ab4cd
ab3ab4cd
ab4cd

Thanks

0

3 Answers 3

4

The regex (?=(ab.*cd)) will group such matches in group 1 as you can see:

import java.util.regex.*;

public class Main {
  public static void main(String[] args) throws Exception {

    Matcher m = Pattern.compile("(?=(ab.*cd))").matcher("ab1ab2ab3ab4cd");

    while (m.find()) {
      System.out.println(m.group(1));
    }
  }
}

which produces:

ab1ab2ab3ab4cd
ab2ab3ab4cd
ab3ab4cd
ab4cd

You need the look ahead, (?= ... ), otherwise you'll just get one match. Note that regex will fail to produce the desired results if there are more than 2 cd's in your string. In that case, you'll have to resort to some manual string algorithm.

Sign up to request clarification or add additional context in comments.

2 Comments

For my curiozity, for the "reversed" case how should look the rx? String is ab1cd2cd3cd, and I want ab1cd, ab1cd2cd, ab1cd2cd3cd
@Dudu, you can't do that with regex: the regex-engine evaluates from left to right.
1

Looks like you want either ab\w+?cd or \bab\w+?cd\b

4 Comments

I suspect the second is more along the lines of what he wants - or more realistically what he needs.
So how will you get all substrings that match this pattern?
No, that will find just one substring.
Sorry, misread the question. You could still find all substrings by repeatedly calling find with an offset just beyond the start of the last match, but Bart's solution is cleaner.
0
/^ab[a-z0-9]+cd$/gm

If only a b c and digits 0-9 can appear in the middle as in the examples:

/^ab[a-c\d]+cd$/gm

See it in action: http://regexr.com?2tpdu

3 Comments

That will just match/validate a single string, but the OP asked if there's a regex that matches all (sub) strings that match a particular pattern.
@Bart I'm unfamiliar with Java, does it have a global & multiline modifier?
Nope, no global modifier. You'll have to traverse all possible matches yourself, and once a (sub) pattern has been matched, it cannot be a part of another match: that's why I used the zero-width look-ahead in my answer.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.