If you need to dissect and process your matching substrings based on character occurrences, it seems most logical to separate the components during the regex step -- concern yourself with pattern optimization after accuracy and ease of handling is ironed out.
My pattern contains three capture groups, only the middle one requires a positive-length string. Negated capture groups are used for pattern efficiency. I make the assumption that your substrings will not contain # which is used to delimit the substrings. If they may contain #, then please update your question and I'll update my answer.
Pattern Demo
Pattern Explanation:
/ // pattern delimiter
## // match leading substring delimiter
(!)? // optionally capture: an exclamation mark
([^#|]+) // greedily capture: one or more non-hash, non-pipe characters
\|? // optionally match: a pipe
([^#]+)? // optionally capture: one or more non-hash characters
## // match trailing substring delimiter
/ // pattern delimiter
Code: (Demo)
$string='Lorem ipsum dolor sit amet, ##test## consectetur adipiscing elit. Pellentesque id congue massa. Curabitur ##test3|id=5## egestas ullamcorper sollicitudin. Mauris venenatis sed metus ##!test2## vitae pharetra.';
$result=preg_replace_callback(
'/##(!)?([^#|]+)\|?([^#]+)?##/',
function($m){
echo '$m = ';
var_export($m);
echo "\n";
// execute custom processing:
if(isset($m[1][0])){ //check first character of element (element will always be set because $m[2] will always be set)
echo "exclamation found\n";
}
// $m[2] is required (will always be set)
if(isset($m[3])){ // will only be set if there is a positive-length string in it
echo "post-pipe substring found\n";
}
echo "\n---\n";
return '[some replacement text]';
},$string);
var_export($result);
Output:
$m = array (
0 => '##test##',
1 => '',
2 => 'test',
)
---
$m = array (
0 => '##test3|id=5##',
1 => '',
2 => 'test3',
3 => 'id=5',
)
post-pipe substring found
---
$m = array (
0 => '##!test2##',
1 => '!',
2 => 'test2',
)
exclamation found
---
'Lorem ipsum dolor sit amet, [some replacement text] consectetur adipiscing elit. Pellentesque id congue massa. Curabitur [some replacement text] egestas ullamcorper sollicitudin. Mauris venenatis sed metus [some replacement text] vitae pharetra.'
If you are performing custom replacement processes, this method will "optimize" your string handling.
but how to benchmark and understand which is best?run them and look at memory usage and time to run the code.|), as in(\|?). You do not need to escape a hash symbol (#). Also, it's not entirely clear what your parameters are for what the regex should match. But the simplest and probably fastest regex for what you're trying to do is probably going to look like this:~##[^\s#]+?##~s.saffects.which you didn't even use. Be greedy if possible. Engine likes it.~##[^#]*##~