1

I want to parse this string

[[delay-4]]Welcome! [[delay-2]]Do you have some questions for us?[[delay-1]] Please fill input field!

I need to get something like this:

[
    [0] => '[[delay-4]]Welcome!',
    [1] => '[[delay-2]]Do you have some questions for us?',
    [2] => '[[delay-1]] Please fill input field!
];

String can also be something like this (without [[delay-4]] on beginning):

Welcome! [[delay-2]]Do you have some questions for us?[[delay-1]] Please fill input field!

Expected output should be something like this:

    [
        [0] => 'Welcome!',
        [1] => '[[delay-2]]Do you have some questions for us?',
        [2] => '[[delay-1]] Please fill input field!
    ];

I tried with this regex (https://regex101.com/r/Eqztl1/1/)

(?:\[\[delay-\d+]])?([\w \\,?!.@#$%^&*()|`\]~\-='\"{}]+)

But I have problem with that regex if someone writes just one [ in text, regex fails and if I include [ to match I got wrong results.

Can anyone help me with this?

2
  • What exactly do you want to match? You current patter will give you that output 3v4l.org/MvSiH Do you want to match all consecutive square brackets even when they are unbalanced? Perhaps like this regex101.com/r/g5PlFl/1 Commented Jul 26, 2019 at 12:03
  • you regex working fine for both string:- 3v4l.org/PWDK9 and 3v4l.org/MvSiH so what problem you are facing now? Commented Jul 26, 2019 at 12:06

3 Answers 3

2

Two simpler actions might be the route to get the result:

$result = preg_replace('/\s*(\[\[delay-\d+]])/i', "\n$1", $subject);
$result = preg_split('/\r?\n/i', $result, -1, PREG_SPLIT_NO_EMPTY);

Can be seen running here: https://ideone.com/Z5tZI3 and here: https://ideone.com/vnSNYI

This assumes that newline characters don't have special meaning and are OK to split on.


UPDATE: As noted in the comments below it's possible with a single split.

$result = preg_split('/(?=\[\[delay-\d+]])/i', $subject, -1, PREG_SPLIT_NO_EMPTY);

But there are possible issues with zero-length matches and regular expressions, you would have to do your own research on that.

Sign up to request clarification or add additional context in comments.

6 Comments

What if the string contains line-breaks? It can be done with only one preg_split with your first pattern if you change your capture group with a lookahead. ] isn't a special character.
Sure you can - but then you have to be aware of the issues around zero-length matches which is a complex issue to have to explain.
What kind of issues are you speaking about?
Zero-length matching issues, some of which are detailed here: regular-expressions.info/zerolength.html Note it's unclear how this is effects preg_split.
3v4l.org/j3IeT : By force of habit I have never see it as an issue! Whatever, since you use the PREG_SPLIT_NO_EMPTY flag, it isn't a problem.
|
1

In your pattern

(?:[[delay-\d+]])?([\w \,?!.@#$%^&*()|`]~-='\"{}]+)

there is no opening [ in the character class. The problem is that if you add it, you get as you say wrong results.

That is because after matching after matching delay, the character class in the next part which now contains the [ can match the rest of the characters including those of the delay part.

What you could do is to add [ and make the match non greedy in combination with a positive lookahead to assert either the next match for the delay part or the end of the string to also match the last instance.

If you are not using the capturing group and only want the result you can omit it.

(?:\[\[delay-\d+]])?[\w \\,?!.@#$%^&*()|`[\]~\-='\"{}]+?(?=\[\[delay-\d+]]|$)

Regex demo | Php demo

Comments

1

You can do that without regex too.

Explode on [[ and loop the array. If the start of the item is "delay" then add [[

$str = '[[delay-4]]Welcome! [[delay-2]]Do you have some questions for us?[[delay-1]] Please fill input field!';

$arr = array_filter(explode("[[", $str));

foreach($arr as &$val){
    if(substr($val,0,5) == "delay") $val = "[[" . $val;
}

var_dump($arr);

https://3v4l.org/sIui1

4 Comments

@AnantSingh---AlivetoDie Since OPs own regex is hardcoded "delay" then I think we can assume it's not an issue
I am agreed, but still you can add a more generic way too, apart from current answer. There is no harm in that i think :). 3v4l.org/7H1AR And 3v4l.org/npM0p
This will fail if text with pattern "[[some text]]" is present in text. Thanks for the answer! :)
@AleksaArsić always include the fringe cases in your question. How should we know it was even possible to have [[some text]]? It's easily fixed but I see no point in doing so because I'm quite sure there will be a new case coming around the corner.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.