0

I'm building a templating engine in PHP (Django like) that replaces everything between {{ }} with its related data. Right now I'm able to do that, but I'm facing a situation that requires a replacement only between blocks, such as {% for y in x %} loop blocks and ignores all brackets that are not in between them.

I was able to somewhat get some results in this regex101 example but only getting the first {{ }} of each block. What I want to do is to match all {{ }} in each block, excluding the ones that are outside.

8
  • 2
    any reason you want to reinvent the wheel? Commented Jan 19, 2018 at 6:01
  • @rtfm learning purposes? Commented Jan 19, 2018 at 6:02
  • 2
    There's recursive regexen for that. But given that syntactic complexity and your likely experience, this doesn't seem very advisable. Investigate proper tokenization/parsing approaches, and then give up and use an existing template engine. Commented Jan 19, 2018 at 6:03
  • @mario not very welcoming, but I'll have a look into recursive regex... thanks. Commented Jan 19, 2018 at 6:04
  • 1
    It's an interesting topic of course. But generally a bit too broad for an SO question. Extracting the matching block placeholders is at best 20% of the task. You could still use a regex. But for relating start and end scopes, you'll better use a state machine / lil' interpreter anyway. Commented Jan 19, 2018 at 6:07

1 Answer 1

3

For learning purposes (very good!) you have several possibilities:

  1. A multi-step approach (easier to comprehend and to maintain):

  2. An overall regex solution (more complicated & possibly more "fancy")


Ad 1)

Match the blocks with the following expression (see a demo on regex101.com):

{%\ for.*?%}
(?s:.+?)
{%\ endfor.*?%}

And look for pairs of {{...}} in each block with:

{{\s*(.+?)\s*}}

In PHP, this could be:

<?php
$data = <<<DATA
{% for user in users %}
   Hello, {{ user.name }}, you are {{ user.age }} {{ user.name }}
ssssssssssssssssssssss {{ user.name }}
sdsddddddddddddddddddddddddddddd
{% endfor %}

{% for dog in dogs %}
   Your dog is {{ dog.age }} and likes {{ dog.food }}.
{% endfor %}
wwww
{{ user.name }}
DATA;

$block = '~
            {%\ for.*?%}
            (?s:.+?)
            {%\ endfor.*?%}
            ~x';

$variable = '~{{\s*(.+?)\s*}}~';

if (preg_match_all($block, $data, $matches)) {
    foreach ($matches as $match) {
        if (preg_match_all($variable, $match[0], $variables, PREG_SET_ORDER)) {
            print_r($variables);
        }

    }
}
?>


Ad 2)

Match all of the variables in question with an overall expression. Here, you'll need \G (which matches at the position of the last match) and some lookaheads (see a demo for this one at regex101.com as well):

(?:{%\ for.+?%}
|
\G(?!\A)
)
(?s:(?!{%).)*?\K
{{\s*(?P<variable>.+?)\s*}}

Now let's demystify this expression:

(?:{%\ for.+?%}
|
\G(?!\A)
)

Here, we want to either match {%\ for.+?%} (we need the \ as we are in verbose mode) or at the position of the last match with \G. Now, the truth is, \G either matches at the position of the last match or the very beginning of the string. We do not want the latter, hence the neg. lookahead (?!\A).

The next part

(?s:(?!{%).)*?\K

kind of does a "fast forward" to the interesting parts in question.

Broken down, this says

(?s:        # open a non-capturing group, enabling the DOTALL mode
    (?!{%). # neg. lookahead, do not overrun {% (the closing tag)
)*?         # lazy quantifier for the non-capturing group
\K          # make the engine "forget" everything to the left

Now, the rest is easy:

{{\s*(?P<variable>.+?)\s*}}

It's basically, the same construct as for ad 1).

Again, in PHP, this could be:

<?php

$regex = '~
            (?:{%\ for.+?%}
            |
            \G(?!\A)
            )
            (?s:(?!{%).)*?\K
            {{\s*(?P<variable>.+?)\s*}}
            ~x';

if (preg_match_all($regex, $data, $variables)) {
    print_r($variables[1]);
}
?>


With all that said, it's generally a good idea to actually learn more complex patterns but not to reinvent the wheel on the other hand - there's always someone smarter than you & me who has probably taken into account several edge cases, etc.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.