13

I have a file with has several spaces among the words at some point. I need to clean the file and replace existing multi-spaced sequences with one space only. I have written the following statement which does not work at all, and it seems I'm making a big mistake.

 $s = preg_replace("/( *)/", " ", $x);

My file is very simple. Here is a part of it:

Hjhajhashsh dwddd dddd sss   ddd wdd ddcdsefe xsddd   scdc yyy5ty    ewewdwdewde           wwwe ddr3r dce eggrg               vgrg fbjb   nnn  bh jfvddffv mnmb   weer ffer3ef f4r4 34t4 rt4t4t 4t4t4t4t    ffrr  rrr  ww w w ee3e iioi   hj   hmm  mmmmm mmjm lk ;’’ kjmm  ,,,, jjj hhh  lmmmlm m mmmm lklmm jlmm m
1

3 Answers 3

34

Your regex replaces any number of spaces (including zero) with a space. You should only replace two or more (after all, replacing a single space with itself is pointless):

$s = preg_replace("/ {2,}/", " ", $x);
Sign up to request clarification or add additional context in comments.

Comments

3

What I usually do to clean up multiple spaces is:

while (strpos($x, '  ') !== false) {
   $x = str_replace('  ', ' ', $x);
}

Conditions/hypotheses:

  1. strings with multiple spaces are rare
  2. two spaces are by far more common than three or more
  3. preg_replace is expensive in terms of CPU
  4. copying characters to a new string should be avoided when possible

Of course, if condition #1 is not met, this approach does not make sense, but it usually is.

If #1 is met, but any of the others is not (this may depend on the data, the software (PHP version) or even the hardware), then the following may be faster:

if (strpos($x, '  ') !== false) {
   $x = preg_replace('/  +/', ' ', $x); // i.e.: '/␣␣+/'
}

Anyway, if multiple spaces appear only in, say, 2% of your strings, the important thing is the preventive check with strpos, and you probably don't care much about optimizing the remaining 2% of cases.

11 Comments

PHP's regex engine is highly optimized. You should profile this - I'm willing to bet that this approach will be much slower than a single regex replace.
@TimPietzcker: if multiple spaces are rare enough you already lost your bet, because one call to strpos is for sure less expensive than one call to preg_replace
Can you try it on the example string the OP gave?
@TimPietzcker, I'm quite sure that my loop is much slower than a single preg_replace on the OP's example, mainly because what dominates is the function call overhead, and with runs of 15 spaces, as in this case, the strposis called 5 times and the str_replace 4 times. This example is absolutely not realistic, though.
I did some profiling of this solution vs. the preg_replace() solution from @TimPietzcker on a PHP 7.0 system. This solution is nearly identical in duration for strings with 1-2 spaces, but it takes about twice as long for more than 2 spaces. So the preg_replace() solution is preferred.
|
1
// Your input
$str = "Hjhajhashsh dwddd dddd sss   ddd wdd ddcdsefe xsddd   scdc yyy5ty    ewewdwdewde           wwwe ddr3r dce eggrg               vgrg fbjb   nnn  bh jfvddffv mnmb   weer ffer3ef f4r4 34t4 rt4t4t 4t4t4t4t    ffrr  rrr  ww w w ee3e iioi   hj   hmm  mmmmm mmjm lk ;’’ kjmm  ,,,, jjj hhh  lmmmlm m mmmm lklmm jlmm m";
        echo $str.'<br>'; 

        $output = preg_replace('!\s+!', ' ', $str); // Replace multispace with sigle.

        echo $output;

1 Comment

can you explain what is the difference between !\s+! and /\s+/ ? for me? i don't understand. looks like both do the same

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.