15

I'm trying to match a string that may appear over multiple lines. It starts and ends with a specific string:

{a}some string
can be multiple lines
{/a}

Can I grab everything between {a} and {/a} with a regex? It seems the . doesn't match new lines, but I've tried the following with no luck:

$template = preg_replace( $'/\{a\}([.\n]+)\{\/a\}/', 'X', $template, -1, $count );
echo $count; // prints 0

It matches . or \n when they're on their own, but not together!

0

3 Answers 3

32

Use the s modifier:

$template = preg_replace( $'/\{a\}([.\n]+)\{\/a\}/s', 'X', $template, -1, $count );
//                                                ^
echo $count;
Sign up to request clarification or add additional context in comments.

2 Comments

Awesome, I knew it would be something simple like that!
Also, I just found that this info IS on the PHP website, even though I've never found it before when looking... php.net/manual/en/reference.pcre.pattern.modifiers.php
7

I think you've got more problems than just the dot not matching newlines, but let me start with a formatting recommendation. You can use just about any punctuation character as the regex delimiter, not just the slash ('/'). If you use another character, you won't have to escape slashes within the regex. I understand '%' is popular among PHPers; that would make your pattern argument:

'%\{a\}([.\n]+)\{/a\}%'

Now, the reason that regex didn't work as you intended is because the dot loses its special meaning when it appears inside a character class (the square brackets)--so [.\n] just matches a dot or a linefeed. What you were looking for was (?:.|\n), but I would have recommended matching the carriage-return as well as the linefeed:

'%\{a\}((?:.|[\r\n])+)\{/a\}%'

That's because the word "newline" can refer to the Unix-style "\n", Windows-style "\r\n", or older-Mac-style "\r". Any given web page may contain any of those or a mixture of two or more styles; a mix of "\n" and "\r\n" is very common. But with /s mode (also known as single-line or DOTALL mode), you don't need to worry about that:

'%\{a\}(.+)\{/a\}%s'

However, there's another problem with the original regex that's still present in this one: the + is greedy. That means, if there's more than one {a}...{/a} sequence in the text, the first time your regex is applied it will match all of them, from the first {a} to the last {/a}. The simplest way to fix that is to make the + ungreedy (a.k.a, "lazy" or "reluctant") by appending a question mark:

'%\{a\}(.+?)\{/a\}%s'

Finally, I don't know what to make of the '$' before the opening quote of your pattern argument. I don't do PHP, but that looks like a syntax error to me. If someone could educate me in this matter, I'd appreciate it.

2 Comments

Oh, that must be a typo - I was origially using a variable there and replaced it with a string for this example.
This was a great explanation. Cheers for this.
3

From http://www.regular-expressions.info/dot.html:

"The dot matches a single character, without caring what that character is. The only exception are newline characters."

you will need to add a trailing /s flag to your expression.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.