6

I had to match a number followed by itself 14 times. Then I've came to the following regular expression in the regexstor.net/tester:

(\d)\1{14}

Edit

When I paste it in my code, including the backslashes properly:

"(\\d)\\1{14}"

I've replaced the back-reference "\1" by the "$1" which is used to replace matches in Java.

Then I've realized that it doesn't work. When you need to back-reference a match in the REGEX, in Java, you have to use "\N", but when you want to replace it, the operator is "$N".

My question is: why?

3
  • 1
    That is not just Java, in most of the regex flavors \N is back-reference in regex pattern. $ has special meaning in regex Commented Jun 9, 2016 at 19:10
  • Yes, "$" means the end of expression, but why they don't use \N to replace too? Commented Jun 9, 2016 at 19:13
  • Some of them like python, sed or perl do allow \N in replacement but Java designers decided $ notation Commented Jun 9, 2016 at 19:14

3 Answers 3

10

$1 is not a back reference in Java's regexes, nor in any other flavor I can think of. You only use $1 when you are replacing something:

String input="A12.3 bla bla my input";
input = StringUtils.replacePattern(
            input, "^([A-Z]\\d{2}\\.\\d).*$", "$1");
//                                            ^^^^

There is some misinformation about what a back reference is, including the very place I got that snippet from: simple java regex with backreference does not work.


Java modeled its regex syntax after other existing flavors where the $ was already a meta character. It anchors to the end of the string (or line in multi-line mode).

Similarly, Java uses \1 for back references. Because regexes are strings, it must be escaped: \\1.

From a lexical/syntactic standpoint it is true that $1 could be used unambiguously (as a bonus it would prevent the need for the "evil escaped escape" when using back references).

To match a 1 that comes after the end of a line the regex would need to be $\n1:

this line
1

It just makes more sense to use a familiar syntax instead of changing the rules, most of which came from Perl.

The first version of Perl came out in 1987, which is much earlier than Java, which was released in beta in 1995.

I dug up the man pages for Perl 1, which say:

The bracketing construct (\ ...\ ) may also be used, in which case \<digit> matches the digit'th substring. (Outside of the pattern, always use $ instead of \ in front of the digit. The scope of $<digit> (and $\`, $& and $') extends to the end of the enclosing BLOCK or eval string, or to the next pattern match with subexpressions. The \<digit> notation sometimes works outside the current pattern, but should not be relied upon.) You may have as many parentheses as you wish. If you have more than 9 substrings, the variables $10, $11, ... refer to the corresponding substring. Within the pattern, \10, \11, etc. refer back to substrings if there have been at least that many left parens before the backreference. Otherwise (for backward compatibilty) \10 is the same as \010, a backspace, and \11 the same as \011, a tab. And so on. (\1 through \9 are always backreferences.)

Sign up to request clarification or add additional context in comments.

5 Comments

"Java modeled its regex syntax after other existing flavors where the $ was already a meta character. It anchors to the end of the string (or line in multi-line mode)" makes sense. Do you have any source?
@Jaumzera I do now ;)
I don't know what's "evil escaped escape", could you provide a link to it?
@Raining In other regex flavors, you can just have a single escape character: \1. In Java, you must escape that escape: \\1. This is clearly evil.
@Laurel you saved my life. I didn't know in Java the numeric reference had to be scaped with double bar \\...I agree, this is clearly evil haha.
4

I think the main Problem is not the backreference - which works perfectly fine with \1 in java.

Your Problem is more likely the "overall" escaping of a regex pattern in Java.

If you want to have the pattern

(\d)\1{14}

passed to the regex engine, you first need to escape it cause it's a java-string when you write it:

(\\d)\\1{14}

Voila, works like a charm: goo.gl/BNCx7B (add http://, SO does not allow Url-Shorteners, but tutorialspoint.com has no other option as it seems)

Offline-Example:

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class HelloWorld{

     public static void main(String []args){
        String test = "555555555555555"; // 5 followed by 5 for 14 times.

        String pattern = "(\\d)\\1{14}";

        Pattern r = Pattern.compile(pattern);
        Matcher m = r.matcher(test);
        if (m.find( )) {
           System.out.println("Matched!");   
        }else{
           System.out.println("not matched :-(");    
        }
     }
}

4 Comments

Thank you for attention, @dognose. I do know about String/Regex escaping in Java. I've realized that I should put it in the question. I'm editing it right now.
@Jaumzera Just see the example i provided - if the escaped pattern does not work - then your error is somewhere else, but not within the "pattern". (Are you sure that you have 15 times the same number? (cause you said 1 + 14 followers) - and not only 14 in total?)
Well, I've get your point. But my doubt was about the replacement operator itself not about the regex. Thank for your time. +one.
Doesn't work for me if I use ([0-9]{2}-)\\1{2}[0-9]{2} or ([0-9]{2})-\\1-\\1-\\1
0

Backreferences are used to Match The Same Text Again.

For example: I used below Regex code to match duplicates and trim neighboring characters which can be used to fix a typo that occasionally occurs in a word or string.

String text = "aabbbccccdddddefg";
System.out.println(text.replaceAll("(.)\1*(?=\\1)", ""));

// Outputs: abcdefg

Credits also goes to this answer by dognose for reminding us again that when calling the backreference character inside the Lookaround (?=) capturing group, it needs to be escaped in Java for things to function properly.

Additionally, you can check here to learn more about Regex.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.