1

in a very big string I have to delete the [w:r][/w:r] where the substring "delete" exist. Example -of substring I want to delete - :

[w:r w:rsidR="00A37EED" w:rsidRPr="00FE1BE1"][w:rPr][w:b][/w:rPr][w:t]delete[/w:t][/w:r]

This one is my best guess \[w:r.*delete.*\[\/w:r\] I tried multiple regex expression but it's not my strong suit.

I copy-pasted the string on regex101 here's the link https://regex101.com/r/wS4bL2/1

I succeeded at finding the required pattern but I can't make it stop at the first occurence of [/w:r].

PHP code -in case you are wondering- :

$this->tempDocumentMainPart = preg_replace('/\[w:r.*delete.*\[\/w:r\]/','',$this->tempDocumentMainPart);

1 Answer 1

2

The .* will overflow across the [....]s. One way is to use a tempered greedy token:

\[w:r\b(?:(?!\[w:r\b).)*?delete(?:(?!\[w:r\b).)*?\[\/w:r]
        ^^^^^^^^^^^^^^^^^       ^^^^^^^^^^^^^^^^^

See the regex demo

The (?:(?!\[w:r\b).)*? tempered greedy token will limit matching inside one [w:r (that has a word boundary on the right).

Add a DOTALL modifier /s ('/PATTERN/s') so as to match across newlines.

Sign up to request clarification or add additional context in comments.

4 Comments

It's hard to boost this regex performance by unrolling the pattern as '~\[w:r\b[^[]*(?:\[(?!w:r\b)[^[]*)*delete[^[]*(?:\[(?!w:r\b)[^[]*)*?\[\/w:r]~' - the input is full of square brackets :( But still it is faster.
it seems great, let me try this on multiple examples and I'll be back with the green check
I suggest using the unrolled variant, it is faster and is thus more reliable.
yea that's what I used

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.