I am trying to match a pattern which may be nested.
Here is some example data where I want to extract the content inside the {{ loop ... } element:
<ul>
{{ loop #users as #u }}
<li>{{ #u.first_name }} {{ #u.last_name }}</li>
{{ endloop }}
</ul>
I get it correctly with this RegEx:
/{{\s+loop\s+#([a-zA-Z_][a-zA-Z0-9_]*)((?:\.[a-zA-Z0-9_]+)*)\s+as\s+#([a-zA-Z_][a-zA-Z0-9_]*)\s+}}(.*){{\s+endloop\s+}}/sU
Explanation:
/{{start of open loop element
\s+loop\s+loop keyword#([a-zA-Z_][a-zA-Z0-9_]*)a variable name (ex:#var)((?:\.[a-zA-Z0-9_]+)*)optional variable key (ex: #var.key)\s+as\s+as keyword#([a-zA-Z_][a-zA-Z0-9_]*)\s+alias variable name (ex:#alias)}}end of open loop element(.*)the loop content{{\s+endloop\s+}}close loop element/sU
Where it fails
With nested loops, I need to get the content of the first level loop (because content is then parsed recursively in my project). Here is some example data:
1| <ul>
2| {{ loop #users as #u }}
3| <li>
4| {{ #u.first_name }} {{ #u.last_name }}
5| <ul>
6| {{ loop #u.friends as #f }}
7| <li>{{ #f.first_name }} {{ #f.last_name }}</li>
8| {{ endloop }}
9| </ul>
10| </li>
11| {{ endloop }}
12| </ul>
13|
14| {{ loop #foo as #bar }}
15| <a href="#">{{ #bar }}</a>
16| {{ endloop }}
With this content, the pattern will stop at the first {{ endloop }} encountered (lines 2-8).
If I remove the U flag (ungreedy), I can't use multiple loops as it will stop to the last {{ endloop }} even if they are different loops (lines 2-16).
I had a previous version of the pattern using the /m flag (multiline) but it failed too as it only matched the deepest level loop (lines 6-8).
I had many attempts (mostly done on regexr.com) but could not see any progress. I searched for a solution about "recursive patterns", the best I found was this question but after many attempts I could not adapt it to my project.
- Is there a flag / flags combination to give priority for this kind of pattern ?
- I read a bit about recursion in RegEx with
(?R)but haven't succeed to use it, would it be helpful in my case ? - obvious last question: how can I match the whole content of the first-level loops ?
I am not only looking for the solution, I would really appreciate to understand how I can solve this. Link to current RegexR: regexr.com/426fd
loop/{{ endloop }}(notice this "corrupt" input demo, and revo's solution would grab greedily from the firstlooptill the last{{ endloop }}upon such input (demo).