1

So, I'll start off with posting some code:

$output = preg_replace([
  '/#(.*?)/i'
], [
  '<h1>$1</h1>'
], "#Input");

And that ended up outputting:

<h1></h1>
Input

In HTML OPT, the output I'd like to achieve is <h1>Input</h1> from inputting #Input, kind of like Markdown, but this is for a basic editing system.

I looked into a Regex Debugger (here) and the debug/stack trace was that the first group was nothing, and the index group was the #.

To my knowledge, the only things that are taken and put into groups (I was told this) is ( ... )'s, and from left -> right the groups are labelled from $1 -> $x.

Sorry for the overused REGEX questions.

2 Answers 2

3

You have an extra "?" in your Regex.

Try with:

$output = preg_replace([
  '/#(.*)/is'
], [
  '<h1>$1</h1>'
], "#Input");

Since you aren't matching any word-characters, case insensitivity doesn't make much sense, so you could write:

$output = preg_replace([
  '/#(.*)/s'
], [
  '<h1>$1</h1>'
], "#Input");

And of course, if this were the actual solution I'd try to be a bit narrower on my match definition (depending on your actual requirements). E.g.:

$output = preg_replace([
  '/#([^#\s]+)/s'
], [
  '<h1>$1</h1>'
], $string);

Here you have it working. And here the final version.

Sign up to request clarification or add additional context in comments.

3 Comments

FYI: The i modifier is redundant here, and . with s modifier matches any characters which might be a problem in case the text the input is an already marked up string.
Not how I would construct this regex, just focusing on the immediate problem and I assume this is not the full code for the question, but your point is taken. I'll work on improving my answer for future visitors. Thanks.
Yeah, apologies as it's just part of my code-base which required the i operator as much as the s, I'll remove the s from my for future visitors so there's no bad practise going around. @WiktorStribiżew and Answer OP
1

The problem here is that the lazy dot matching pattern appears at the end of the pattern, and since it does not have to match any text, it does not. Your regex matches a # and captures empty string as Group 1.

If you mean to actually match something, use, say

'/#(\S+)/'

to match a # and capture 1 or more non-whitespace chars into Group 1.

Instead of \S+, you might want to use a more restricted pattern (like \w+ for 1 or more word chars, [^<]+ to match 1 or more chars other than <, or [^\s<]+ to match 1+ chars other than whitespace and <).

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.