PHP Regex grouping not working as expected

Question

So, I'll start off with posting some code:

$output = preg_replace([
  '/#(.*?)/i'
], [
  '<h1>$1</h1>'
], "#Input");

And that ended up outputting:

<h1></h1>
Input

In HTML OPT, the output I'd like to achieve is <h1>Input</h1> from inputting #Input, kind of like Markdown, but this is for a basic editing system.

I looked into a Regex Debugger (here) and the debug/stack trace was that the first group was nothing, and the index group was the #.

To my knowledge, the only things that are taken and put into groups (I was told this) is ( ... )'s, and from left -> right the groups are labelled from $1 -> $x.

Sorry for the overused REGEX questions.

yivi · Accepted Answer · 2016-12-27 13:03:14Z

3

You have an extra "?" in your Regex.

Try with:

$output = preg_replace([
  '/#(.*)/is'
], [
  '<h1>$1</h1>'
], "#Input");

Since you aren't matching any word-characters, case insensitivity doesn't make much sense, so you could write:

$output = preg_replace([
  '/#(.*)/s'
], [
  '<h1>$1</h1>'
], "#Input");

And of course, if this were the actual solution I'd try to be a bit narrower on my match definition (depending on your actual requirements). E.g.:

$output = preg_replace([
  '/#([^#\s]+)/s'
], [
  '<h1>$1</h1>'
], $string);

Here you have it working. And here the final version.

edited Dec 27, 2016 at 13:03

answered Dec 27, 2016 at 12:48

yivi

48.2k18 gold badges133 silver badges157 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Wiktor Stribiżew Over a year ago

FYI: The i modifier is redundant here, and . with s modifier matches any characters which might be a problem in case the text the input is an already marked up string.

yivi Over a year ago

Not how I would construct this regex, just focusing on the immediate problem and I assume this is not the full code for the question, but your point is taken. I'll work on improving my answer for future visitors. Thanks.

Jack Hales Over a year ago

Yeah, apologies as it's just part of my code-base which required the i operator as much as the s, I'll remove the s from my for future visitors so there's no bad practise going around. @WiktorStribiżew and Answer OP

Wiktor Stribiżew · Accepted Answer · 2016-12-27 12:47:30Z

1

The problem here is that the lazy dot matching pattern appears at the end of the pattern, and since it does not have to match any text, it does not. Your regex matches a # and captures empty string as Group 1.

If you mean to actually match something, use, say

'/#(\S+)/'

to match a # and capture 1 or more non-whitespace chars into Group 1.

Instead of \S+, you might want to use a more restricted pattern (like \w+ for 1 or more word chars, [^<]+ to match 1 or more chars other than <, or [^\s<]+ to match 1+ chars other than whitespace and <).

answered Dec 27, 2016 at 12:47

Wiktor Stribiżew

631k41 gold badges502 silver badges632 bronze badges

Collectives™ on Stack Overflow

PHP Regex grouping not working as expected

2 Answers 2

3 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related