1

Question: Print character(s) in a string that are repeated consecutively only twice (not more).

Examples:

1)"aaabbaa" : b and a
2)"aabbaa" : a and b and a
3)"abba" : b

Code I tried:

String str = "aabbbbcccd";
Pattern p = Pattern.compile("(\w){2}");
Matcher m = p.matcher(str);
while(m.find())
{
System.out.println(m.group(1));
}

Output:
a
b
b
c
d
Although, the desired output is
a
d

Postscript
As I have recently started with regex, it would highly appreciated if the answerer can explain
the regex used briefly (especially quantifiers and groups).

0

1 Answer 1

3

There is no single plain regex solution to this problem because you need a lookbehind with a backreference inside, which is not supported by Java regex engine.

What you can do is either get all (\w)\1+ matches and then check their length using common string methods:

String s = "aaabbaa";
Pattern pattern = Pattern.compile("(\\w)\\1+");
Matcher matcher = pattern.matcher(s);
while (matcher.find()){
    if (matcher.group().length() == 2) System.out.println(matcher.group(1)); 
} 

(see the Java demo) or you can match 3 or more repetitions or just 2 repetitions and only grab the match if the Group 2 matched:

String s = "aaabbaa";
Pattern pattern = Pattern.compile("(\\w)\\1{2,}|(\\w)\\2");
Matcher matcher = pattern.matcher(s);
while (matcher.find()){
    if (matcher.group(2) != null)
        System.out.println(matcher.group(2)); 
} 

See this Java demo. Regex details:

  • (\w)\1{2,} - a word char and two or more occurrences of the same char right after
  • | - or
  • (\w)\2 - a word char and the same char right after.
Sign up to request clarification or add additional context in comments.

2 Comments

Of course. I had some doubts: 1) In the first case, instead of using <<(\\w)\\1+>> can I use <<"(\\w){1,}">> or <<"(\\w){2}">> 2) Does <<\\1>> tells the compiler that it is for group 1. 3) Can I use <<{1,}>> instead of <<+>>
@believer {1,} is the same as +, it repeats the pattern it modifies one or more times. \w+ matches one or more letters/digits/_, and thus matches ab123_any___, 123, _, etc. The \1 in (\w)\1 is a backreference to the Group 1 value, and only matches what has been captured. Use whatever you need, it only depends on your real requirements, but one thing is certain: you cannot just change \1 with +.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.