7

When testing an answer for another user's question I found something I don't understand. The problem was to replace all literal \t \n \r characters from a string with a single space.

Now, the first pattern I tried was:

/(?:\\[trn])+/

which surprisingly didn't work. I tried the same pattern in Perl and it worked fine. After some trial and error I found that PHP wants 3 or 4 backslashes for that pattern to match, as in:

/(?:\\\\[trn])+/

or

/(?:\\\[trn])+/

these patterns - to my surprise - both work. Why are these extra backslashes necessary?

1
  • 1
    Perl regexes are integrated into language, so you need only two backslashes. Commented Jan 27, 2010 at 11:42

4 Answers 4

13

You need 4 backslashes to represent 1 in regex because:

  • 2 backslashes are used for unescaping in a string ("\\\\" -> \\)
  • 1 backslash is used for unescaping in the regex engine (\\ -> \)

From the PHP doc,

escaping any other character will result in the backslash being printed too1

Hence for \\\[,

  • 1 backslash is used for unescaping the \, one stay because \[ is invalid ("\\\[" -> \\[)
  • 1 backslash is used for unescaping in the regex engine (\\[ -> \[)

Yes it works, but not a good practice.

Sign up to request clarification or add additional context in comments.

Comments

8

Its works in perl because you pass that directly as regex pattern /(?:\\[trn])+/

but in php, you need to pass as string, so need extra escaping for backslash itself.

"/(?:\\\\[trn])+/"

The regex \ to match a single backslash would become '/\\\\/' as a PHP preg string

Comments

2

The regular expression is just /(?:\\[trn])+/. But since you need to escape the backslashes in string declarations as well, each backslash must be expressed with \\:

"/(?:\\\\[trn])+/"
'/(?:\\\\[trn])+/'

Just three backspaces do also work because PHP doesn’t know the escape sequence \[ and ignores it. So \\ will become \ but \[ will stay \[.

3 Comments

Then why do 3 backslashes work? And why aren't single quotes different from double quotes in this case?
Gumbo:: just so I know if I understood correctly -- this case works because \[ isn't a control character and it does not become a literal open square bracket because the pattern is parsed left to right so the backslash gets attached to the one preceding it and previously escaped?
@kemp: Yes, only the escape sequences listed in the manual are replaced.
-2

Use str_replace!

$code = str_replace(array("\t","\n","\r"),'',$code);

Should do the trick

3 Comments

This doesn't answer my question, and is also wrong because str_replace() doesn't allow substitution of all the requested characters (however many they are) with a single quote -- you can just remove them all.
@kemp yes it does. If it doesn't remove as it is try combinations of \r\n or \n\r
No, you can't substitute - say - three (or any arbitrary number) of those with a single whitespace, unless you want to provide every possible combination. What your code does is just removing them all.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.