0

I need to make 3 groups out of the following text:

[startA]
this is the first group
 [startB]
 blabla
[end]
[end]
[startA]
this is the second group
 [startB]
 blabla
[end]
[end]
[startA]
this is the second group
 [startB]
 blabla
[end]
[end]

As you can see, each group begins with [startA] and ends with [end], it should be easy to make a regex that matches this.
But the problem is that inside a group, the string [end] is used an arbitrary amount of times.
The regex should match a group that starts with [startA] and ends with the [end] right before the next [startA], not a previous [end].

I think it should be done with lookahead but none of my attempts have worked so far.
Is it possible to do this with a regex?

2 Answers 2

1

You should use recursive regex pattern

preg_match_all('/\[(?!end)[^[\]]+\](?:[^[\]]*|[^[\]]*(?R)[^[\]]*)\[end\]\s*/', $s, $m);

See this demo.

Sign up to request clarification or add additional context in comments.

Comments

0

Yes, you indeed may solve this with lookahead:

$test_string = <<<TEST
[startA]
this is the first group
 [startB]
 blabla
[end]
[end]
[startA]
this is the second group
 [startB]
 blabla
[end]
[end]
[startA]
this is the third group
 [startB]
 blabla
[end]
[end]
TEST;
preg_match_all('#\[startA](.+?)\[end]\s*(?=\[startA]|$)#s', 
    $test_string, $matches);
var_dump($matches[1]);

Here's ideone demo.

The key is using alternation in lookahead sub-pattern, to test either for the next [startA] section, or the end of the string ($).

Note the /s modififer: without it . meta-character won't match endlines ("\n").

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.