1

I would like to use PHP's preg_replace() to search a text for occurrences of a certain word, and enclose that word in brackets, unless there are already brackets present. The challenge here is that I want to test for brackets that may or may not be directly adjacent to the text I am looking for.

Random example: I want to replace warfarin with [[warfarin]]

  1. in this string: Use warfarin for the prevention of strokes
  2. but not in this string: Use [[warfarin]] for the prevention of strokes (brackets already present)
  3. and not in this string either: Use [[generic warfarin formulation]] for the prevention of strokes ('remote' brackets already present)

I can satisfy the first two requirements all right using lookbehind and lookahead assertions:

php > echo preg_replace( "/(?<!\[\[)(warfarin)(?!]])/", "[[$1]]", "Use warfarin for the prevention of strokes" );
Use [[warfarin]] for the prevention of strokes
php > echo preg_replace( "/(?<!\[\[)(warfarin)(?!]])/", "[[$1]]", "Use [[warfarin]] for the prevention of strokes" );
Use [[warfarin]] for the prevention of strokes

But I need your help with the third requirement, i.e. not adding brackets when there are 'remote' brackets present:

php > echo preg_replace( "/(?<!\[\[)(warfarin)(?!]])/", "[[$1]]", "Use [[generic warfarin formulation]] for the prevention of strokes" );
Use [[generic [[warfarin]] formulation]] for the prevention of strokes

In this last example, the square brackets should not be added to the word warfarin since it is contained in a longer expression that is already enclosed in brackets.

The problem is that PHP's regexp assertions must have fixed length, otherwise it would be very simple.

I'm using

PHP 5.3.10-1ubuntu3.1 with Suhosin-Patch (cli) (built: May  4 2012 02:20:36)

Thanks in advance!

2 Answers 2

2

This is what I would do.

$str = 'Use warfarin for the prevention of strokes. ';
$str .= 'Use [[warfarin]] for the prevention of strokes. ';
$str .= 'Use [[generic warfarin formulation]] for the prevention of strokes';
$arr = preg_split('/(\[\[.*?\]\])/',$str,-1,PREG_SPLIT_DELIM_CAPTURE);
// split the string by [[...]] groups
for ($i = 0; $i < count($arr); $i+=2) {
    // even indexes will give plain text parts
    $arr[$i] = preg_replace('/(warfarin)/i','[[$1]]',$arr[$i]);
    // enclose necessary ones by double brackets
}
echo '<h3>Original:</h3>' . $str;
$str = implode('',$arr); // finally join them
echo '<h3>Changed:</h3>' . $str;

will result in

Original:

Use warfarin for the prevention of strokes. Use [[warfarin]] for the prevention of strokes. Use [[generic warfarin formulation]] for the prevention of strokes

Changed:

Use [[warfarin]] for the prevention of strokes. Use [[warfarin]] for the prevention of strokes. Use [[generic warfarin formulation]] for the prevention of strokes

Sign up to request clarification or add additional context in comments.

4 Comments

Very elegant answer, thanks! I marked Eugene's answer as the accepted one however since it fits my particular programming problem just a little bit better. Not your fault.
Sure, no problem. Perhaps somebody else might benefit from this in the future.
In the end I did turn to this solution because it is much more flexible (can use alternative expressions to split the string). I was wary of a possible performance impact of the for loop (which, as I have to process many more words than just 'warfarin' is nested inside another loop), but so far there has been no noticeable delay.
In case you encounter a speed issue… while loops work much faster than for loops. In the past I tested all types of loops in several different languages and in certain conditions while loops used like $i=-1;while(++$i<etc){} (as far as I remember) were noticeably faster than the others.
1

Try this:

echo preg_replace( "/(warfarin)([^\]]+(\[|$))/", "[[$1]]$2", "Use generic warfarin[[ formulation for]] the prevention of strokes\n" );

I assume that there won't be any case of closing brackets without opening brackets.

2 Comments

Thanks very much, this is very nice! You are right, I can assume that there should always be opening brackets.
Well it turns out that this solution cannot of course replace multiple occurrences in the same subject, so I to be honest I must give the answer to @inhan

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.