2

I'm new to regex and have been trying to work this out on my own but I don't seem to get it working. I have an input that contains start and end flags and I want to replace a certain char, but only if it's between the flags.

So for example if the start flag is START and the end flag is END and the char i'm trying to replace is " and I would be replacing it with \"

I would say input.replaceAll(regex, '\\\"');

I tried making a regex to only match the correct " chars but so far I have only been able to get it to match all chars between the flags and not just the " chars. -> (?<=START)(.*)(?=END)

Example input:

This " is START an " example input END string ""
START This is a "" second example END
This" is "a START third example END " "

Expected output:

This " is START an \" example input END string ""
START This is a \"\" second example END
This" is "a START third example END " "
5
  • 1
    a little confused by how its worded. what exactly are you trying to replace? the START END and everything inbetween? or just some specific characters inbetween START and END? Commented Aug 21, 2022 at 1:23
  • only the quotes inbetween the START and END and nothing else, so any quotes that are not inbetween START and END should be left alone Commented Aug 21, 2022 at 1:33
  • you could do something like this: (?<=START).*(").*(?=END) and replace the1st group capture. I'm not great with regex, but that's how i could figure it Commented Aug 21, 2022 at 2:23
  • example of using named groups and replace Commented Aug 21, 2022 at 2:27
  • 1
    Thank you for your answer, but your suggestion only seems to cathch the last quote inbetween the START and END in a group and it skips any quotes that came before it. Commented Aug 21, 2022 at 3:02

3 Answers 3

2

Find all characters between START and END, and for those characters replace " with \".

To achieve this, apply a replacer function to all matches of characters between START and END:

string = Pattern.compile("(?<=START).*?(?=END)").matcher(string)
    .replaceAll(mr -> mr.group().replace("\"", "\\\\\""));

which produces your expected output.

Some notes on how this works.

This first step is to match all characters between START and END, which uses look arounds with a reluctant quantifier:

(?<=START).*?(?=END)

The ? after the .* changes the match from greedy (as many chars as possible while still matching) to reluctant (as few chars as possible while still matching). This prevents the middle quote in the following input from being altered:

START a"b END c"d START e"f END

A greedy quantifier will match from the first START all the way past the next END to the last END, incorrectly including c"d.

The next step is for each match to replace " with \". The full match is group 0, or just MatchResult#group. and we don't need regex for this replacement - just plain string replace is enough (and yes, replace() replaces all occurrences).

Sign up to request clarification or add additional context in comments.

1 Comment

This works perfectly, I did have to modify it because the version of java i'm using doesn't support lambda's but works great thank you
0

For now i've been able to solve it by creating 3 capture groups and continuously replacing the match until there are no more matches left. In this case I even had to insert a replace indentifier because replacing with " would keep the " char there and create an infinite loop. Then when there are no more matches left I replaced my identifier and i'm now getting the expected result.

I still feel like there has to be a way cleaner way to do this using only 1 replace statement...

Code that worked for me:

class Playground {
    public static void main(String[ ] args) {
        String input = "\"ThSTARTis is a\" te\"\"stEND \" !!!";

        String regex = "(.*START.+)\"+(.*END+.*)";

        while(input.matches(regex)){
            input = input.replaceAll(regex, "$1---replace---$2");
        }

        String result = input.replace("---replace---", "\\\"");

        System.out.println(result);
    }
}

Output:

"ThSTARTis is a\" te\"\"stEND " !!!

I would love any suggestions as to how I could solve this in a better/cleaner way.

Comments

0

Another option is to make use of the \G anchor with 2 capture groups. In the replacement use the 2 capture groups followed by \"

(?:(START)(?=.*END)|\G(?!^))((?:(?!START|END)(?>\\+\"|[^\r\n\"]))*)\"

Explanation

  • (?: Non capture group
    • (START)(?=.*END) Capture group 1, match START and assert there is END to the right
    • | Or
    • \G(?!^) Assert the current position at the end of the previous match
  • ) Close non capture group
  • ( Capture group 2
    • (?: Non capture group
      • (?!START|END) Negative lookhead, assert not START or END directly to the right
      • (?>\\+\"|[^\r\n\"]) Match 1+ times \ followed by " or match any char except " or a newline
    • )* Close the non capture group and optionally repeat it
  • ) Close group 2
  • \" Match "

See a Java regex demo and a Java demo

For example:

String regex = "(?:(START)(?=.*END)|\\G(?!^))((?:(?!START|END)(?>\\\\+\\\"|[^\\r\\n\\\"]))*)\\\"";
String string = "This \" is START an \" example input END string \"\"\n"
+ "START This is a \"\" second example END\n"
+ "This\" is \"a START third example END \" \"";
String subst = "$1$2\\\\\"";

Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
Matcher matcher = pattern.matcher(string);

String result = matcher.replaceAll(subst);

System.out.println(result);

Output

This " is START an \" example input END string ""
START This is a \"\" second example END
This" is "a START third example END " "

1 Comment

Unfortunately I already used another answer, but this looks great and doesn't use lambda's so I wouldn't have had to rewrite it :), great explanation as well

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.