0

I have the following Java code:

String initial = "Phone number: [194-582-9412]";
System.out.println(initial.replaceAll("\\d{3}\\-\\d{3}(?=\\-\\d{4})","XXX-XXX"));
System.out.println(initial.replaceAll("\\d{3}\\-\\d{3}(?:\\-\\d{4})","XXX-XXX"));

which produces output:

Phone number: [XXX-XXX-9412]
Phone number: [XXX-XXX]

My logic was to find 3 digits, a dash, 3 digits (capturing to this point), a dash, and four digits (non-capturing to this point). According to this tutorial, lookahead groups starting with ?= are non-capturing. According to the Pattern Javadoc, groups beginning with ?: are also non-capturing. I expected both regular expressions to produce the same output, Phone number: [XXX-XXX-9412]. However, the regular expression with the non-capturing group (?:\\-\\d{4}) seems to capture the entire phone number and replace it. Why is this happening?

6
  • Any reason to not just capture the phone number itself as a group? Seems like a double negative (replacing what you don't want, vs grabbing what you do) Commented Aug 26, 2019 at 20:50
  • 1
    Lookarounds do not consume characters in the string. A non capturing group does consume characters in the string, but does not create a capturing group. Read about lookarounds and grouping and capturing Commented Aug 26, 2019 at 20:54
  • @Rogue I need phone numbers (which will always be in the same format) to be masked before sending the relevant data part to the user, if they don't have the security credentials for that information. I'm just wondering about the weird behavior of ?: here, because it was the first option that popped to mind, and it didn't work as expected. Commented Aug 26, 2019 at 20:54
  • 1
    @Thefourthbird Alright, I think I understand now. The bit about consuming characters (as opposed to capturing group) was most helpful for me. Commented Aug 26, 2019 at 21:02
  • It isn't capturing. But you are replacing everything that matches which includes the non-capturing group. The reason the first one worked wasn't because it was non-capturing, it was because it's zero-width. Commented Aug 26, 2019 at 21:02

1 Answer 1

1

You can actually do what you wanted using capturing groups. Here it captures the part you want to keep and replaces the whole string. The $1 is a back reference to the capture group.

 System.out.println(
            initial.replaceAll("\\d{3}-\\d{3}(\\-\\d{4})", "XXX-XXX$1"));

And I presume you realize that if the regex doesn't match, then the original string is returned with no changes.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.