1

I've this code:

$string="some text {@block}outside{@block}inside{@}outside{@} other text";

function catchPattern($string,$layer){
  preg_match_all(
    "/\{@block\}".
      "(".
        "(".
           "[^()]*|(?R)".
        ")*".
      ")".
    "\{@\}/",$string,$nodes);
  if(count($nodes)>1){
    for($i=0;$i<count($nodes[1]); $i++){
      if(is_string($nodes[1][$i])){
        if(strlen($nodes[1][$i])>0){
          echo "<pre>Layer ".$layer.":   ".$nodes[1][$i]."</pre><br />";
          catchPattern($nodes[1][$i],$layer+1);
        }
      }
    }
  }
}

catchPattern($string,0);

That gives me this output:

Layer 0:   outside{@block}inside{@}outside

Layer 1:   inside

And all it's ok! But If I change a bit string and regexp:

$string="some text {@block}outside{@block}inside{@end}outside{@end} other text";

function catchPattern($string,$layer){
  preg_match_all(
    "/\{@block\}".
      "(".
        "(".
           "[^()]*|(?R)".
        ")*".
      ")".
    "\{@end\}/",$string,$nodes);
  if(count($nodes)>1){
    for($i=0;$i<count($nodes[1]); $i++){
      if(is_string($nodes[1][$i])){
        if(strlen($nodes[1][$i])>0){
          echo "<pre>Layer ".$layer.":   ".$nodes[1][$i]."</pre><br />";
          catchPattern($nodes[1][$i],$layer+1);
        }
      }
    }
  }
}

catchPattern($string,0);

I didnt get any output. Why? I expected the same output.

4
  • What are you actually asking here? Please update the question with sample text to be matched Commented Mar 23, 2013 at 9:14
  • @kaᵠ: Sample text is in the code. Commented Mar 23, 2013 at 9:18
  • I've updated the question. But the question is clear: why second cond has no the same behavior of the first one? Commented Mar 23, 2013 at 9:20
  • @SimoneDemoGentili using the below answer's regex this is a more optimized way of running the above viper-7.com/u7iun7 than using for() and count()s Commented Mar 23, 2013 at 9:25

1 Answer 1

5

The problem is that the backtracking limit is exhausted. You can always modify the backtracking limit. However, for the cases I have come across, rewriting the regex is the better solution.

You can't just anyhow modify an existing regex and expect to make it work, especially for recursive regex. It seems that you take the existing bracket matching regex and modify it. There are a few problems in your regex:

  • [^()]*: There is no reason to exclude () inside the text within the {@block}{@end} portion. But the more severe problem is that it matches {}. The engine will go all the way to the nearest () or the end of the string, fail to match, then backtrack. This is why the backtracking limit is reached.

    This can be fixed by changing this portion to [^{}] to disallow {} inside {@block}{@end}. Nested {@block}{@end} will still be matched, due to the recursion.

    Note that this will totally disallow {} to be specified as text within {@block}{@end}. It may be possible to modify the regex to allow such case, depending on the escaping scheme.

    I also change the quantifier of [^{}] from * to +, since there is no reason to match an empty string when the quantifier of the whole group ([^{}]+|(?R)) is *.

    /\{@block\}((?:[^{}]+|(?R))*)\{@end\}/
    
  • After the modification above, the second problem is with invalid input string. The default behavior of quantifier is that backtracking will be performed until a match is found or all possibilities are exhausted. Therefore, you will reach backtracking limit in those cases.

    Since what [^{}]+ can match and what the recursive regex can match are mutually exclusive1, the regex is not ambiguous and can be matched without backtracking. We can tell the engine not to backtrack by using possessive quantifier, which is the normal quantifier, with + added behind.

The final solution is:

/\{@block\}((?:[^{}]++|(?R))*+)\{@end\}/

Demo

Footnotes

1: It is quite obvious, since text matching [^{}]+ will never start with {, while the text matching the recursive regex must start with {.

Sign up to request clarification or add additional context in comments.

2 Comments

@nhahtdh what if i want to have some vars inside {@block}, for example {@block}Hello {name}!{@end}? Your pattern doesn't allow to have { or } inside block.
@WojciechJasiński: You need to change [^{}]++ to (?:(?!\{@end\}|\{@block\}).)++. Basically, you need to prevent the starting and ending tag to be recognized as content between the tags; otherwise, the balance will not be guaranteed.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.