0

I am trying to match a pattern which may be nested.
Here is some example data where I want to extract the content inside the {{ loop ... } element:

<ul>
    {{ loop #users as #u }}
        <li>{{ #u.first_name }} {{ #u.last_name }}</li>
    {{ endloop }}
</ul>

I get it correctly with this RegEx:

/{{\s+loop\s+#([a-zA-Z_][a-zA-Z0-9_]*)((?:\.[a-zA-Z0-9_]+)*)\s+as\s+#([a-zA-Z_][a-zA-Z0-9_]*)\s+}}(.*){{\s+endloop\s+}}/sU

Explanation:

  • /
  • {{ start of open loop element
    • \s+loop\s+ loop keyword
    • #([a-zA-Z_][a-zA-Z0-9_]*) a variable name (ex: #var)
    • ((?:\.[a-zA-Z0-9_]+)*) optional variable key (ex: #var.key)
    • \s+as\s+ as keyword
    • #([a-zA-Z_][a-zA-Z0-9_]*)\s+ alias variable name (ex: #alias)
  • }} end of open loop element
  • (.*) the loop content
  • {{\s+endloop\s+}} close loop element
  • /sU

Where it fails

With nested loops, I need to get the content of the first level loop (because content is then parsed recursively in my project). Here is some example data:

 1| <ul>
 2|     {{ loop #users as #u }}
 3|         <li>
 4|             {{ #u.first_name }} {{ #u.last_name }}
 5|             <ul>
 6|                 {{ loop #u.friends as #f }}
 7|                     <li>{{ #f.first_name }} {{ #f.last_name }}</li>
 8|                 {{ endloop }}
 9|             </ul>
10|         </li>
11|     {{ endloop }}
12| </ul>
13| 
14| {{ loop #foo as #bar }}
15|     <a href="#">{{ #bar }}</a>
16| {{ endloop }}

With this content, the pattern will stop at the first {{ endloop }} encountered (lines 2-8).
If I remove the U flag (ungreedy), I can't use multiple loops as it will stop to the last {{ endloop }} even if they are different loops (lines 2-16).
I had a previous version of the pattern using the /m flag (multiline) but it failed too as it only matched the deepest level loop (lines 6-8).

I had many attempts (mostly done on regexr.com) but could not see any progress. I searched for a solution about "recursive patterns", the best I found was this question but after many attempts I could not adapt it to my project.


  • Is there a flag / flags combination to give priority for this kind of pattern ?
  • I read a bit about recursion in RegEx with (?R) but haven't succeed to use it, would it be helpful in my case ?
  • obvious last question: how can I match the whole content of the first-level loops ?

I am not only looking for the solution, I would really appreciate to understand how I can solve this. Link to current RegexR: regexr.com/426fd

7
  • If you want to keep captures subroutines (recursion) won't help. See this demo. Commented Oct 31, 2018 at 8:46
  • @WiktorStribiżew actually your pattern does exactly what I'm looking for (maybe I explained badly), I wanted to catch the 4th group of your pattern. You can post the pattern as an answer if you want with a little explanation of this last group maybe ? Commented Oct 31, 2018 at 9:06
  • Revo's and my solutions are different (they match different texts), so I would refrain from comparing their efficiency. My approach is to only match strings between corresponding loop / {{ endloop }} (notice this "corrupt" input demo, and revo's solution would grab greedily from the first loop till the last {{ endloop }} upon such input (demo). Commented Nov 2, 2018 at 8:10
  • Thanks for the additional information, @WiktorStribiżew. I use your solution, I credited you in the code, is it okay for you ? Commented Nov 2, 2018 at 8:25
  • Yes, no problem. :) Commented Nov 2, 2018 at 8:30

2 Answers 2

2

Here is a performance-wise fix to your problem (it takes a few hundred steps instead of evil thousand backtracking ones):

{{\s+loop\s+#(\w+)[^#]*#(\w+)\s*}}(?:[^{]*+|(?R)|{+)*{{\s+endloop\s+}}

See live demo here

RegExp breakdown:

  • {{\s+loop\s+#(\w+)[^#]*#(\w+)\s*}} Match a starting loop structure and capture hashed words
  • (?: Start of non-capturing group
    • [^{]*+ Match anything but a { possessively
    • | Or
    • (?R) Recurs whole pattern
    • | Or
    • {+ Match any number of opening braces
  • )* Match as much as possible
  • {{\s+endloop\s+}} Match an ending structure
Sign up to request clarification or add additional context in comments.

4 Comments

Works well too except it does not directly capture the content into a group. It could be nice to use to allow multiple loop syntaxes. I maybe need to change the variable naming rules in my templating engine which are strict actually. Thanks for your answer !
To capture the inner content you need to enclose non-capturing group in a capturing group. See it here. And please what multiple loop syntaxes are you talking about?
I tried to remove the ?: instead of enclosing.. (dumb me). Actually there is only loop ... as syntax allowed but with your pattern we could replace as with anything else.
Yes, that is another minor fix that could be done. The major reason that someone should go with this answer is its consideration about being fast in matching or failing. It's a huge difference.
1

Here is a quick fix of your current pattern:

{{\s+loop\s+#([a-zA-Z_]\w*)((?:\.\w+)*)\s+as\s+#([a-zA-Z_]\w*)\s*}}((?:(?!{{\s+(?:end)?loop\s).|(?R))*){{\s+endloop\s+}}

Note you do not need U modifier for this pattern to run as expected, but you still need the s modifier for . to match any char.

See the regex demo

The main difference is the replacement of .* with (?:(?!{{\s+(?:end)?loop\s).|(?R))*. It matches 0 or more repetitions of:

  • (?!{{\s+(?:end)?loop\s). - any char (.) that is not starting a sequence meeting the following pattern:
    • {{ - a {{ substring
    • \s+ - 1+ whitespaces
    • (?:end)? - an optional end substring
    • loop - a loop substring
    • \s - a whitespace
  • | - or
  • (?R) - the whole regex pattern

Besides, [a-zA-Z0-9_] is equal to \w if you are not using u modifier or (*UCP) PCRE verb, hence the whole pattern can be shortened a bit.

1 Comment

Thanks a lot for the explanation, still hard to understand but I'll work on my skills. Thanks for the last advices too, I'll shorten the pattern as it's going to be really long !

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.