4

I have a variable v that possibly appears more than one time consecutively in a string. I want to make it so that all consecutive vs turn into just one v. For example:

String s = "Hello, world!";
String v = "l";

The regex would turn "Hello, world!" into "Helo, world!"

So I want to do something like

s = s.replaceAll(vv+, v)

But obviously that won't work. Thoughts?

5 Answers 5

17

Let's iteratively develop the solution; in each step we point out what the problems are and fix it until we arrive at the final answer.

We can start with something like this:

String s = "What???? Impo$$ible!!!";
String v = "!";

s = s.replaceAll(v + "{2,}", v);
System.out.println(s);
// "What???? Impo$$ible!"

{2,} is the regex syntax for finite repetition, meaning "at least 2 of" in this case.

It just so happen that the above works because ! is not a regex metacharacter. Let's see what happens if we try the following:

String v = "?";

s = s.replaceAll(v + "{2,}", v);
// Exception in thread "main" java.util.regex.PatternSyntaxException:       
// Dangling meta character '?'

One way to fix the problem is to use Pattern.quote so that v is taken literally:

s = s.replaceAll(Pattern.quote(v) + "{2,}", v);
System.out.println(s);
// "What? Impo$$ible!!!"

It turns out that this isn't the only thing we need to worry about: in replacement strings, \ and $ are also special metacharacters. That explains why we get the following problem:

String v = "$";
s = s.replaceAll(Pattern.quote(v) + "{2,}", v);
// Exception in thread "main" java.lang.StringIndexOutOfBoundsException:
// String index out of range: 1

Since we want v to be taken literally as a replacement string, we use Matcher.quoteReplacement as follows:

s = s.replaceAll(Pattern.quote(v) + "{2,}", Matcher.quoteReplacement(v));
System.out.println(s);
// "What???? Impo$ible!!!"

Lastly, repetition has higher precedence than concatenation. This means the following:

System.out.println(  "hahaha".matches("ha{3}")    ); // false
System.out.println(  "haaa".matches("ha{3}")      ); // true
System.out.println(  "hahaha".matches("(ha){3}")  ); // true

So if v can contain multiple characters, you'd want to group it before applying the repetition. You can use a non-capturing group in this case, since you don't need to create a backreference.

String s = "well, well, well, look who's here...";
String v = "well, ";
s = s.replaceAll("(?:" +Pattern.quote(v)+ "){2,}", Matcher.quoteReplacement(v));
System.out.println(s);
// "well, look who's here..."

Summary

  • To match an arbitrary literal string that may contain regex metacharacters, use Pattern.quote
  • To replace with an arbitrary literal string that may contain replacement metacharacters, use Matcher.quoteReplacement

References


Bonus material

The following example uses reluctant repetition, capturing group and backreferences mixed with case-insensitive matching:

    System.out.println(
        "omgomgOMGOMG???? Yes we can! YES WE CAN! GOAAALLLL!!!!"
            .replaceAll("(?i)(.+?)\\1+", "$1")
    );
    // "omg? Yes we can! GOAL!"

Related questions

References

Sign up to request clarification or add additional context in comments.

2 Comments

This is a way better solution, even down to the "{2,}" being better regex form than concatenating. Both aren't functionally necessary since just a Pattern.quote(v) + "+" would work (a single match being replaced with itself results in no change).
don't you need to add noncapturing parentheses e.g. "(?:"+Pattern.quote(v)+"){2,}" for multiple characters in the string? (as per my answer and gustafc's)
5

Use x{2,} to match x at least twice.

To be able to replace characters with special meanings for regexps, you'd use Pattern.quote:

String part = Pattern.quote(v);
s = s.replaceAll(part + "{2,}", v);

To replace things longer than one character, use non-capturing groups:

String part = "(?:" + Pattern.quote(v) + ")";
s = s.replaceAll(part + "{2,}", v);

1 Comment

+1; I incorporated the need for grouping into my answer as well.
4

You need to concatenate the two "v" Strings.

Try s = s.replaceAll(v + v + "+", v)

1 Comment

This will only work for characters that aren't special characters in a regex context.
3

With regex's in Java make sure to use Pattern.quote and Matcher.quoteReplacement:

package com.example.test;

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Regex2 {
    static public void main(String[] args)
    {
        String s = "Hello, world!";
        String v = "l";

        System.out.println(doit(s,v));

        s = "Test: ??r??r Solo ??r Frankenstein!";
        v = "??r";

        System.out.println(doit(s,v));

    }

    private static String doit(String s, String v) 
    {
        Pattern p = Pattern.compile("(?:"+Pattern.quote(v)+"){2,}");

        Matcher m = p.matcher(s);
        StringBuffer sb = new StringBuffer();
        while (m.find())
        {
            m.appendReplacement(sb, Matcher.quoteReplacement(v));
        }
        m.appendTail(sb);
        return sb.toString();
    }
}

2 Comments

Can't wait until Matcher takes any Appendable instead of StringBuffer... It's an RFE somewhere in the bugdb...
agreed. (would much rather use StringBuilder)
2
s = s.replaceAll (v + "+", v)

1 Comment

I didn't downvote but this will only work for characters that aren't special characters in a regex context.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.