0

I am having a hard time figuring out what is happening with this regexp to match multiple whitespace :

$str = '   ';

if (preg_match_all('/\s{2,}/', $str, $matches)) {
    var_dump($matches);
}

The fact is, if i replace str value with 3 "real" spaces, it works as expected, but obviously the characters in str are not whitespaces (copy paste from other source) !! But i need to match them to replace them with real spaces/whatever.

My question: What are those simple space looking characters in str and more important, how do i target them in a regexp ?

1
  • Please reformulate the question so that it could be answered. Commented Apr 30, 2016 at 12:05

2 Answers 2

2

The middle character is a utf-8 encoded non-breaking space. Add the utf-8 modifier u to your regex and it'll work just fine, e.g. /\s{2,}/u.

Outputs:

array(1) {
  [0]=>
  array(1) {
    [0]=>
    string(4) "   "
  }
}

Example

Sign up to request clarification or add additional context in comments.

Comments

0

The whitespace characters captured by \s may include real space (code 0x20) horizontal tab character (0x09), carriage return (0x0D), line feed (0x0A) and form feed (0x0C). So if you want to turn all these characters to real spaces, you may use this line:

$str=preg_replace('/\s/',' ',$str);

Or, if you want to replace a sequence of two or more whitespace characters with just a single real space, use this instead:

$str=preg_replace('/\s{2,}/',' ',$str);

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.