Find and replace (part of) string in comment blocks with regex

Question

I'm trying to find a certain string that can occur inside a comment block. That string can be a word, but it can also be part of a word. For instance, suppose I'm looking for the word "codex", then this word should be replace with "bindex" but even when it's part of a word, like "codexing". This should be changed to "bindexing".

The trick is, that this should only happen when this word is inside a comment block.

/* Lorem ipsum dolor sit amet, codex consectetur adipiscing elit. */

This word --> codex should not be replaced

/* Lorem ipsum dolor sit 
 * amet, codex consectetur 
 * adipiscing elit. 
 */

/** Lorem ipsum dolor sit 
 * amet, codex consectetur 
 * adipiscing elit. 
 */

// Lorem ipsum dolor sit amet, codex consectetur adipiscing elit.

# Lorem ipsum dolor sit amet, codex consectetur adipiscing elit.

------------------- Below "codex" is part of a word -------------------

/* Lorem ipsum dolor sit amet, somecodex consectetur adipiscing elit. */

/* Lorem ipsum dolor sit 
 * amet, codexing consectetur 
 * adipiscing elit. 
 */

And here also, this word --> codex should not be replaced

/** Lorem ipsum dolor sit 
 * amet, testcodexing consectetur 
 * adipiscing elit. 
 */

// Lorem ipsum dolor sit amet, __codex consectetur adipiscing elit.

# Lorem ipsum dolor sit amet, codex__ consectetur adipiscing elit.

What I have so far is this code:

$text = preg_replace ( '~(\/\/|#|\/\*).*?(codex).*?~', '$1 bindex', $text);

As you can see in this example, this isn't really working the way I'd like. It doesn't replace the word when it's inside a multiline /* */ comment block, And sometimes it removes all the text that was in front of the word "codex" as well.

How can I improve my regex so that it meets my requirements?

anubhava · Accepted Answer · 2013-08-05 20:01:44Z

3

Since you're dealing with multi-line text here you should be using s modifier (DOTALL) to match text across multiple line. Also forward slash doesn't need to be escaped.

Try this code:

$text = preg_replace ( '~(//|#|/\*).*?(codex).*?~s', '$1 bindex', $text );

answered Aug 5, 2013 at 20:01

anubhava

790k67 gold badges603 silver badges671 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

w00 Over a year ago

Thanks alot, this seems to be doing exactly what i want! :)

trijin · Accepted Answer · 2013-08-05 21:46:12Z

2

$text = preg_replace ( '~(//|#|/\*)(.*?)(codex).*?~s', '$1$2bindex', $text );

this not delete comments before 'codex' like in answer from anubhava

edited Aug 5, 2013 at 21:46

answered Aug 5, 2013 at 20:22

trijin

4833 silver badges6 bronze badges

Comments

Casimir et Hippolyte · Accepted Answer · 2021-11-13 22:40:54Z

[EDIT] I edited this answer because despite my naïve relentlessness at the time, I resolved to admit that it isn't possible to solve this problem with a simple or complicated preg_replace! Sorry for the good soul who had upvoted my answer.[/EDIT]

To answer the question: It's not possible to improve your pattern, It's not possible to do it with preg_replace at all! You have to build a pattern for preg_replace_callback that matches a whole comment and proceed to the replacement of codex occurrences in the callback function.

This version can deal with any type of comments and will not fail with this kind of strings /**/ codex /**/ or /*xxxx codex codex xxxx*/ or any other traps.

$result = preg_replace_callback('~/\*.*?\*/|#\N+|//\N+~s', function($m) {
    return stri_replace('codex', 'bindex', $m[0]);
}, $subject);

Note that in addition to the fact that this pattern is simpler, it is efficient too since each branch of the alternation is "anchored" because they start with a literal character. The pattern therefore benefits from automatic optimizations.

EZLearner · Accepted Answer · 2013-08-05 20:00:09Z

0

As was written hundreds, thousands or maybe even millions of times before in different comments, Regular Expressions are NOT for parsing code, or searching for errors in one.

Consider these examples:

// code to be replaced
var a = "/*code to be replaced*/";

/* code to be replaced
var b = "*/code to be replaced"; */

There is no way for you to parse the code (and yes, finding out if a string is inside a comment block is called parsing) with REGEX.

Find a parser library, or create a diminished one of your own. If you do create one, remember all the different use-cases of the script, and in particular, how strings will affect your code.

answered Aug 5, 2013 at 20:00

EZLearner

1,86417 silver badges29 bronze badges

1 Comment

w00 Over a year ago

I am not parsing code, i'm searching for a string that is preceeded by (/*|//|#). Nothing a modern regex language can't do. As obviously is proven by a given answer. This is not HTML or XML that i'm trying to parse or anything amoung those lines.

Robadob · Accepted Answer · 2013-08-05 20:05:20Z

0

Something like this using sub groups should work;

$str = preg_replace(
    '~(<!--[a-zA-Z0-9 \n]*)(MYWORD)([a-zA-Z0-9 \n]*-->)~s',
    '$1$3',
     $input
);

You will just need to create a separate rule for each type of comment, and limit the possible characters allowed inside the comment with a character class (You might prefer to use a negated character class).

edited Aug 5, 2013 at 20:05

answered Aug 5, 2013 at 19:57

Robadob

5,3492 gold badges29 silver badges33 bronze badges

3 Comments

Martin Ender Over a year ago

That is for HTML comments and won't replace more than one codex per comment block. It doesn't cover line breaks either.

Robadob Over a year ago

Call it again until it returns no matches.

Robadob Over a year ago

If you want to allow line breaks in comments you will need to add it to the character class '\n', you should really do something like this with full code rather than trying to short cut it with regex. By using .* You run the risk of MYWORD being identified and replaced.

Collectives™ on Stack Overflow

Find and replace (part of) string in comment blocks with regex

5 Answers 5

1 Comment

Comments

Comments

1 Comment

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

1 Comment

Comments

Comments

1 Comment

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related