Replace multiple capture groups using regexp with java

Question

I have this requirement - for an input string such as the one shown below

8This8 is &reallly& a #test# of %repl%acing% %mul%tiple 9matched9 9pairs

I would like to strip the matched word boundaries (where the matching pair is 8 or & or % etc) and will result in the following

This is really a test of repl%acing %mul%tiple matched 9pairs

This list of characters that is used for the pairs can vary e.g. 8,9,%,# etc and only the words matching the start and end with each type will be stripped of those characters, with the same character embedded in the word remaining where it is.

Using Java I can do a pattern as \\b8([^\\s]*)8\\b and replacement as $1, to capture and replace all occurrences of 8...8, but how do I do this for all the types of pairs?

I can provide a pattern such as \\b8([^\\s]*)8\\b|\\b9([^\\s]*)9\\b .. and so on that will match all types of matching pairs *8,9,..), but how do I specify a 'variable' replacement group -

e.g. if the match is 9...9, the the replacement should be $2.

I can of course run it through multiple of these, each replacing a specific type of pair, but I am wondering if there is a more elegant way.

Or is there a completely different way of approaching this problem?

Thanks.

Avinash Raj · Accepted Answer · 2014-12-11 05:59:44Z

4

You could use the below regex and then replace the matched characters by the characters present inside the group index 2.

(?<!\S)(\S)(\S+)\1(?=\s|$)

OR

(?<!\S)(\S)(\S*)\1(?=\s|$)

Java regex would be,

(?<!\\S)(\\S)(\\S+)\\1(?=\\s|$)

DEMO

String s1 = "8This8 is &reallly& a #test# of %repl%acing% %mul%tiple 9matched9 9pairs";
System.out.println(s1.replaceAll("(?<!\\S)(\\S)(\\S+)\\1(?=\\s|$)", "$2"));

Output:

This is reallly a test of repl%acing %mul%tiple matched 9pairs

Explanation:

(?<!\\S) Negative lookbehind, asserts that the match wouldn't be preceded by a non-space character.
(\\S) Captures the first non-space character and stores it into group index 1.
(\\S+) Captures one or more non-space characters.
\\1 Refers to the character inside first captured group.
(?=\\s|$) And the match must be followed by a space or end of the line anchor.
This makes sure that the first character and last character of the string must be the same. If so, then it replaces the whole match by the characters which are present inside the group index 2.

For this specific case, you could modify the above regex as,

String s1 = "8This8 is &reallly& a #test# of %repl%acing% %mul%tiple 9matched9 9pairs";
System.out.println(s1.replaceAll("(?<!\\S)([89&#%])(\\S+)\\1(?=\\s|$)", "$2"));

DEMO

edited Dec 11, 2014 at 5:59

answered Dec 11, 2014 at 5:00

Avinash Raj

175k32 gold badges247 silver badges289 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

ssen Over a year ago

Thanks. Using the back reference and capture groups to 2, as suggested by you and another person, seems to have nailed it. I am using the following (?<!\S)(8|9|&|#|%)(\S+)\1(?=\s|$) where the first capture group contains the list of all the characters that can be a part of the paired pattern.

Avinash Raj Over a year ago

@ssen exactly you got that. Much more reduced one (?<!\S)([89&#%])(\S+)\1(?=\s|$) regex101.com/r/qB0jV1/19

vks · Accepted Answer · 2014-12-11 05:04:58Z

1

(?<![a-zA-Z])[8&#%9](?=[a-zA-Z])([^\s]*?)(?<=[a-zA-Z])[8&#%9](?![a-zA-Z])

Try this.Replace with $1 or \1.See demo.

https://regex101.com/r/qB0jV1/15

(?<![a-zA-Z])[^a-zA-Z](?=[a-zA-Z])([^\s]*?)(?<=[a-zA-Z])[^a-zA-Z](?![a-zA-Z])

Use this if you have many delimiters.

answered Dec 11, 2014 at 5:04

vks

68.1k11 gold badges96 silver badges132 bronze badges

Collectives™ on Stack Overflow

Replace multiple capture groups using regexp with java

2 Answers 2

2 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related