45

I am trying to write a method that will accept a String, inspect it for instances of certain tokens (e.g. ${fizz}, ${buzz}, ${foo}, etc.) and replace each token with a new string that is fetched from a Map<String,String>.

For example, if I pass this method the following string:

"How now ${fizz} cow. The ${buzz} had oddly-shaped ${foo}."

And if the method consulted the following Map<String,String>:

Key             Value
==========================
"fizz"          "brown"
"buzz"          "arsonist"
"foo"           "feet"

Then the resultant string would be:

"How now brown cow. The arsonist had oddly-shaped feet."

Here is my method:

String substituteAllTokens(Map<String,String> tokensMap, String toInspect) {
    String regex = "\\$\\{([^}]*)\\}";
    Pattern pattern = Pattern.compile(regex);
    Matcher matcher = pattern.matcher(toInspect);
    while(matcher.find()) {
        String token = matcher.group();     // Ex: ${fizz}
        String tokenKey = matcher.group(1); // Ex: fizz
        String replacementValue = null;

        if(tokensMap.containsKey(tokenKey))
            replacementValue = tokensMap.get(tokenKey);
        else
            throw new RuntimeException("String contained an unsupported token.");

        toInspect = toInspect.replaceFirst(token, replacementValue);
    }

    return toInspect;
}

When I run this, I get the following exception:

Exception in thread "main" java.util.regex.PatternSyntaxException: Illegal repetition near index 0
${fizz}
^
    at java.util.regex.Pattern.error(Pattern.java:1730)
    at java.util.regex.Pattern.closure(Pattern.java:2792)
    at java.util.regex.Pattern.sequence(Pattern.java:1906)
    at java.util.regex.Pattern.expr(Pattern.java:1769)
    at java.util.regex.Pattern.compile(Pattern.java:1477)
    at java.util.regex.Pattern.<init>(Pattern.java:1150)
    at java.util.regex.Pattern.compile(Pattern.java:840)
    at java.lang.String.replaceFirst(String.java:2158)
    ...rest of stack trace omitted for brevity (but available upon request!)

Why am I getting this? And what is the correct fix? Thanks in advance!

6 Answers 6

67

In ${fizz}

{ is an indicator to the regex engine that you are about to start a repetition indicator, like {2,4} which means '2 to 4 times of the previous token'. But {f is illegal, because it has to be followed by a number, so it throws an exception.

You need to escape all regex metacharacters (in this case $, { and }) (try using http://docs.oracle.com/javase/6/docs/api/java/util/regex/Pattern.html#quote(java.lang.String) ) or use a different method that substitutes a string for a string, not a regex for a string.

Sign up to request clarification or add additional context in comments.

18 Comments

The { is escaped though: "\\$\\{([^}]*)\\}"
@Brian No it's not. Stack trace: at java.lang.String.replaceFirst(String.java:2158) refers to this line: toInspect = toInspect.replaceFirst(token, replacementValue); And token's value is ${fizz}, with no escaping.
Ah, yes! Of course. Read too fast, I guess. +1 Maybe you could elaborate more on what exactly is being passed in and why it's failing.
Exactly - I'm still struggling to put your answer into motion, but thanks @Patashu and +1!
@TicketMonster replaceFirst expects a regular expression in the first argument, just like the Pattern.compile method. However, you're passing it ${fizz} without any escaping. Use Pattern.quote(token) before passing token in.
|
7

As pointed out by Patashu, the problem is in replaceFirst(token, replacementValue), that expects a regex in the first argument, not a literal. Change it to replaceFirst(Pattern.quote(token), replacementValue) and you will do alright.

I also changed a bit the first regex, as it goes faster with + instead of * but that's not necessary.

static String substituteAllTokens(Map<String,String> tokensMap, String toInspect) {
    String regex = "\\$\\{([^}]+)\\}";
    Pattern pattern = Pattern.compile(regex);
    Matcher matcher = pattern.matcher(toInspect);
    String result = toInspect;
    while(matcher.find()) {
        String token = matcher.group();     // Ex: ${fizz}
        String tokenKey = matcher.group(1); // Ex: fizz
        String replacementValue = null;

        if(tokensMap.containsKey(tokenKey))
            replacementValue = tokensMap.get(tokenKey);
        else
            throw new RuntimeException("String contained an unsupported token.");

        result = result.replaceFirst(Pattern.quote(token), replacementValue);
    }

    return result;
}

5 Comments

If you want to get really picky about speed, use [^}]++ to make it possessive instead of just greedy so it will never backtrack.
The speed results are surprising: using my method, the original regex ([^}]*) took 37.04 ms to run 10000 times, my regex ([^}]+) took 37.35 ms and suggested regex ([^}]++) 36.98 ms. Almost no difference.
For a longer input "How now ${fizz} cow. The ${buzz} had oddly-shaped ${foo}.How now ${fizz} cow. The ${buzz} had oddly-shaped ${foo}.${fizz}${foo}${buzz}${buzz}${foo}${fizz}${foo}${fizz}${buzz}${fizz}${foo}${buzz}${buzz}${foo}${fizz}${foo}${fizz}${buzz}${fizz}${foo}${buzz}${buzz}${foo}${fizz}${foo}${fizz}${buzz}" the results were different for regex. The original regex ([^}]*) took 515.47 ms to run 10000 times, my regex ([^}]+) took 514.41 ms and regex suggested by @Brian ([^}]++) 507.41 ms. Still close, but Brian's did better.
Did you use an actual benchmarking tool to check like Caliper? Also, you should make sure that you test with partially matched strings as well, such as The ${fizz cow with no terminating }. Also, to be fair, I said really picky :3
No, I wrote the benchmarking myself. Did a warm-up loop in the beginning to activate hotspot and called System.gc() several times outside measurement. But, yes, I know it is not bulletproof. I didn't know about Caliper, though. Good to know it!
2

Adapted from Matcher.replaceAll

boolean result = matcher.find();
if (result) {
    StringBuffer sb = new StringBuffer();
    do {
        String tokenKey = matcher.group(1); // Ex: fizz
        String replacement = Matcher.quoteReplacement(tokensMap.get(tokenKey));
        matcher.appendReplacement(sb, replacement);
        result = matcher.find();
    } while (result);
    matcher.appendTail(sb);
    return sb.toString();
}

4 Comments

Adding here my test results using this method: the original regex ([^}]*) took 15.14 ms to run 10000 times, my regex ([^}]+) took 14.88 ms and regex suggested by @Brian ([^}]++) 15.89 ms. As I said before this method is a absolute winner in terms of speed and here my regex fits better :)
@Miguel Try to use a very long toInspect as input, such as "${fizz}${buzz}${foo}${fizz}${buzz}${foo}${fizz}${buzz}${foo}...${fizz}${buzz}${foo}${fizz}${buzz}${foo}${fizz}${buzz}${foo}${fizz}${buzz}${foo}" (I bet my StringBuffer will be faster than your String, when the length grow, it won't be only 3x faster.)
Using "How now ${fizz} cow. The ${buzz} had oddly-shaped ${foo}.How now ${fizz} cow. The ${buzz} had oddly-shaped ${foo}.${fizz}${foo}${buzz}${buzz}${foo}${fizz}${foo}${fizz}${buzz}${fizz}${foo}${buzz}${buzz}${foo}${fizz}${foo}${fizz}${buzz}${fizz}${foo}${buzz}${buzz}${foo}${fizz}${foo}${fizz}${buzz}" as the input this method performed even better then mine, it was 5x faster. The results were different for regex. The original regex ([^}]*) took 92.94 ms to run 10000 times, my regex ([^}]+) took 93.6 ms and regex suggested by @Brian ([^}]++) 91.83 ms. Brian's did better. Why is that so?
@Miguel I only know about 3x->5x: 1. replaceFirst use a regex again, which is unnecessary. 2. In a loop, StringBuffer concats String faster than directly in String, since String generate a new String everytime. Which regex to use is not important.
2

You can make your RegEx a bit ugly, but this will work

String regex = "\\$[\\{]([^}]*)[\\}]";

Comments

0

Use String-replaceAll. Sample input String for testing "SESSIONKEY1":

"${SOMESTRING.properties.SESSIONKEY1}"

,

    String pattern = "\\\"\\$\\{SOMESTRING\\.[^\\}]+\\}\\\""; 
    System.out.println(pattern);
    String result = inputString.replaceAll(pattern, "null");
    return result.toString();

Comments

0

If you don't know the input String and it contains $ or { then use this way

public static String replacemyownSubStr(String i, String target, String replacement) {
        StringBuilder result = new StringBuilder();
        int lastIndex = 0;
        int index = i.indexOf(target);
        while (index >= 0) {
            result.append(i.substring(lastIndex, index));
            result.append(replacement);
            lastIndex = index + target.length();
            index = i.indexOf(target, lastIndex);
        }
        result.append(i.substring(lastIndex));
        return result.toString();
    }

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.