3

I'm trying to get from a text all the occurrences of a code snippet and the 3 parameters. I do this using a regular expression and preg_match_all PHP function.

It works fine if I have just one occurrence of the snippet present in the text. If there are two or more I get a weird result.

I'm not so expert with regular expressions so I have some difficulties to understand what am I missing.

Function

public function getGallerySnippetOccurrences($text) {

    $ptn = '/{# +gallery +(src|width|height)=\[(.*)\] +(src|width|height)=\[(.*)\] +(src|width|height)=\[(.*)\] +#}/';

    if(preg_match_all($ptn,$text,$matches)){
        $turnedMatches = $this->turn_array($matches);
        return $turnedMatches;
    }
    else {
        return null;
    }
}

Text 1 (in this case works as aspected)

Lorem ipsum {# gallery src=[holiday_images/london] width=[400] height=[300] #} sid amet.

Returns:

array(1) {
  [0] =>
  array(7) {
    [0] =>
    string(66) "{# gallery src=[holiday_images/london] width=[400] height=[300] #}"
    [1] =>
    string(3) "src"
    [2] =>
    string(21) "holiday_images/london"
    [3] =>
    string(5) "width"
    [4] =>
    string(3) "400"
    [5] =>
    string(6) "height"
    [6] =>
    string(3) "300"
  }
}

Text 2 (unespected behaviour)

Lorem ipsum {# gallery src=[holiday_images/london] width=[400] height=[300] #} sid amet {# gallery src=[holiday_images/paris] width=[400] height=[300] #}

Returns

array(1) {
  [0] =>
  array(7) {
    [0] =>
    string(141) "{# gallery src=[holiday_images/london] width=[400] height=[300] #} sid amet {# gallery src=[holiday_images/paris] width=[400] height=[300] #}"
    [1] =>
    string(3) "src"
    [2] =>
    string(96) "holiday_images/london] width=[400] height=[300] #} sid amet {# gallery src=[holiday_images/paris"
    [3] =>
    string(5) "width"
    [4] =>
    string(3) "400"
    [5] =>
    string(6) "height"
    [6] =>
    string(3) "300"
  }
}

What am I doing wrong?

2
  • 2
    Make it non-greedy: /{# +gallery +(src|width|height)=\[(.*?)] +(src|width|height)=\[(.*?)] +(src|width|height)=\[(.*?)] +#}/ Commented Mar 13, 2019 at 17:13
  • see my answer below Commented Mar 13, 2019 at 17:54

2 Answers 2

2

In your pattern, you are using greedy matches using (.) which should be replaced with non-greedy pattern (.?). Please find the pattern below

$ptn = '/{# +gallery +(src|width|height)=\[(.*?)\] +(src|width|height)=\[(.*?)\] +(src|width|height)=\[(.*?)\] +#}/';
Sign up to request clarification or add additional context in comments.

Comments

1

As pointed out in my comment below your answer that making quantifier non-greedy will make it work. However that still leaves your regex repetitive and inefficient.

You may consider this approach for both points:

$re = '/{\#
\h+gallery
\h+(src|width|height)=\[([^]]*)]
\h+((?1))=\[([^]]*)]
\h+((?1))=\[([^]]*)]
\h*\#}/x';

preg_match_all($re, $str, $matches, PREG_SET_ORDER, 0);

// Print the entire match result
var_dump($matches); 

RegEx Demo

  • Note how this regex is defining a sub-pattern and reusing it all over the regex using (?1) to avoid repetitions
  • Also note use of more efficient negated class [^]]* instead of inefficient .*? to capture values.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.