3

For example, I have a string like this:

{% a %}
    {% b %}
    {% end %}
{% end %}

I want to get the content between {% a %} and {% end %}, which is {% b %} {% end %}.
I used to use {% \S+ %}(.*){% end %} to do this. But when I add c in it:

 {% a %}
        {% b %}
        {% end %}
    {% end %}
{% c %}
{% end %}

It doesn't work... How could I do this with regular expression?

7
  • 3
    Is it a nested structure of arbitrary depth? If so, that is not a regular language. Commented Apr 7, 2011 at 15:46
  • 2
    You will probably have a much easier time matching the individual elements with a regular expression and using a stack to match the opening / closing blocks. Commented Apr 7, 2011 at 15:47
  • 2
    @eldarethis: That is red herring, please stop repeating it. IT DOES NOT APPLY because it is absolutely trivial to match nested structures using modern patterns. Commented Apr 7, 2011 at 16:55
  • 1
    @casablanca: Please stop posting that idiotic and irrelevant link. It does not apply, and is wrong anyway. Commented Apr 7, 2011 at 17:18
  • 2
    @eldarerathis: Good thing that PHP regular expressions are not REGULAR! Commented Apr 7, 2011 at 18:54

3 Answers 3

4

Given this test data:

$text = '
{% a %}
    {% b %}
        {% a %}
        {% end %}
    {% end %}
        {% b %}
        {% end %}
{% end %}
{% c %}
{% end %}
';

This tested script does the trick:

<?php
$re = '/
    # Match nested {% a %}{% b %}...{% end %}{% end %} structures.
    \{%[ ]\w[ ]%\}       # Opening delimiter.
    (?:                  # Group for contents alternatives.
      (?R)               # Either a nested recursive component,
    |                    # or non-recursive component stuff.
      [^{]*+             # {normal*} Zero or more non-{
      (?:                # Begin: "unrolling-the-loop"
        \{               # {special} Allow a { as long
        (?!              # as it is not the start of
          %[ ]\w[ ]%\}   # a new nested component, or
        | %[ ]end[ ]%\}  # the end of this component.
        )                # Ok to match { followed by
        [^{]*+           # more {normal*}. (See: MRE3!)
      )*+                # End {(special normal*)*} construct.
    )*+                  # Zero or more contents alternatives
    \{%[ ]end[ ]%\}      # Closing delimiter.
    /ix';
$count = preg_match_all($re, $text, $m);
if ($count) {
    printf("%d Matches:\n", $count);
    for ($i = 0; $i < $count; ++$i) {
        printf("\nMatch %d:\n%s\n", $i + 1, $m[0][$i]);
    }
}
?>

Here is the output:

2 Matches:

Match 1:
{% a %}
    {% b %}
        {% a %}
        {% end %}
    {% end %}
        {% b %}
        {% end %}
{% end %}

Match 2:
{% c %}
{% end %}

Edit: If you need to match an opening tag having more than one word char, replace the two occurrences of the \w tokens with (?!end)\w++, (as is correctly implemented in tchrist's excellent answer).

Sign up to request clarification or add additional context in comments.

Comments

2

Here is a demo in Perl of an approach that works for your dataset. The same should work in PHP.

#!/usr/bin/env perl

use strict;
use warnings;

my $string = <<'EO_STRING';
    {% a %}
            {% b %}
            {% end %}
        {% end %}
    {% c %}
    {% end %}
EO_STRING


print "MATCH: $&\n" while $string =~ m{
    \{ % \s+ (?!end) \w+ \s+ % \}
    (?: (?: (?! % \} | % \} ) . ) | (?R) )*
    \{ % \s+ end \s+ % \}
}xsg;

When run, that produces this:

MATCH: {% a %}
            {% b %}
            {% end %}
        {% end %}
MATCH: {% c %}
    {% end %}

There are several other ways to write that. You may have other constraints that you haven’t shown, but this should get you started.

Comments

0

What you're looking for is called recursive regex. PHP has support for it using (?R).

I'm not familiar enough with it to be able to help you with the pattern itself, but hopefully this is a push in the right direction.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.