PHP Regex replace is not producing the desired result

Question

I'm creating a dictionary application in PHP and MariaDB, and trying to simulate some basic markdown. When I have a definition like this:

This is an example definition. Here is a link to [foo]. This is an [aliased link|bar].

Then [foo] will be translated into a link to the 'foo' definition page, and [aliased link|bar] will translate into a link to the 'bar' definition page. If there's a pipe then whatever's before the pipe (|) will become the link text, and after the pipe becomes the link destination. If there's no pipe, then the expression in brackets becomes the link text and destination.

So I would translate this to the following HTML:

This is an example definition. Here is a link to <a href="foo">foo</a>. This is an <a href="bar">aliased link</a>.

The easiest way I could think of to do this was through two regex replaces. So let's say my example string is called $def, here is the code I've tried to make these replacements:

$pattern1 = '/\[(.*?)?\]/m';
$replace1 = '<a href="$1">$1</a>';
$def = preg_replace($pattern1, $replace1, $def);

$pattern2 = '/\[([^]]*?)(?:\|([^]]*?))\]/m';
$replace2 = '<a href="$2">$1</a>';
$def = preg_replace($pattern2, $replace2, $def);

(I assumed it would be easier to do it using two regexes, but if there's a simpler one-regex solution I'd love to know.)

However, I've clearly got something wrong with the regex, as this is what happens when I echo $def (the links are just illustrative for now, the destination isn't important):

This is an example definition. Here is a link to foo. This is an aliased link|bar.

And the HTML:

"This is an example definition. Here is a link to "
<a href="foo">foo</a>
". This is an" 
<a href="aliased link|bar">aliased link|bar</a>
"."

Can anyone advise what I need to do to fix the regex to get my desired result? I'm especially confused because when I test this regex in www.regex101.com, it seems to do exactly what I think it should do:

I'm using PHP 7.4.6 on Google Chrome, with XAMPP and Apache.

Your second regex isn't wrong but it doesn't do anything because the first preg_replace has already replaced both links. — gdros
– gdros, Commented Mar 27, 2021 at 18:53

The fourth bird · Accepted Answer · 2021-03-27 18:58:05Z

2

Note that in the pattern that you used, you can exclude matching the | by adding it in the first negated character class to prevent some backtracking. The quantifier for the negated character class also does not have to be non greedy *? as the ] can not be crossed at the end.

You could use 2 capture groups where the second group is in an optional part and check for the presence of group 2 using preg_replace_callback.

\[([^][|]+)(?:\|([^][]+))?]

The pattern matches:

\[ Match [
([^][|]+) Capture group 1, match 1+ times any char except [ ] and |
(?:\|([^][]+))? Optional non capture group matching | and capture any char except the listed in group 2
] Match closing ]

Regex demo | Php demo

$pattern = "/\[([^][|]+)(?:\|([^][]+))?\]/";
$s = "This is an example definition. Here is a link to [foo]. This is an [aliased link|bar].";
$s = preg_replace_callback($pattern, function($match){
    $template = '<a href="%s">%s</a>';
    return sprintf($template, array_key_exists(2, $match) ? $match[2] : $match[1], $match[1]);
}, $s);

echo $s;

Output

This is an example definition. Here is a link to <a href="foo">foo</a>. This is an <a href="bar">aliased link</a>.

edited Mar 27, 2021 at 18:58

answered Mar 27, 2021 at 18:41

The fourth bird

165k16 gold badges61 silver badges75 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Lou Over a year ago

That's fantastic, the answer works perfectly, thank you! Just one question - why do you use greedy matches for the two capture groups, i.e. + instead of +?? I always thought it was more performant to use lazy matches, especially as the capture groups in question will still capture everything until they reach a ] character in this case.

The fourth bird Over a year ago

@Lou when you use a lazy match, there will be backtracking. You don't have to use a lazy quantifier in this case, as you can use a greedy quantifier that can not cross matching the ]

Lou Over a year ago

Ah okay, so + would actually perform better than +?, potentially?

The fourth bird Over a year ago

@Lou In this case it does, but for example in this scenario where you want to match qq and you can not use a negated character class [^q] (or else you would not get to the qq because you can not pass the first q using that) depending on the length of the string a non greedy match would have less steps according to the regex101 tool as it is located earlier in the string. See this vs this and with a shorter string this vs this

Collectives™ on Stack Overflow

PHP Regex replace is not producing the desired result

1 Answer 1

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related