1

Have a foreach as follows:

        foreach($data as $r=>$d)
          {
            $return = $return. "<tr>
            <td>
            ".$d["client_id"]."
            </td>
        ......
            <td>
                ".$d["date_stamp"]."

            </td>
            </tr>";
            }
          } 

this takes on my data more than 2seconds! to process, however if I do the following:

      foreach($data as $r=>$d)
        {
          $now= "<tr>
          <td>
          ".$d["client_id"]."
          </td>
      ......
          <td>
              ".$d["date_stamp"]."

          </td>
          </tr>";
          $return = $return.$now;
        } 

it takes only 0.2seconds..

Well,ok, you say "fine, use the second approach", sure I will do, however it is a mystery for me WHY such a great performance difference between the two approaches? Any ideas wellcome..thanks

adding a test case:

   //////////function to get time
    function parsemicrotime(){
       list($usec, $sec) = explode(" ",microtime());
       return ((float)$usec + (float)$sec);
       }

    ////////////define test array
    $a = array();
    for($i = 0; $i < 5000; $i ++)//generate 5k rows
      {
        for($k=0; $k<5;$k++)//lets have just 6 columns
          {
            $a[$i]["column_".$k] = 'test string '.$i.' / '.$k.' - note that the size of the $output string makes a huge difference ';
          }
      }

    ///////////////first test
    $time_start = parsemicrotime();
        $output = '';
        foreach($a as $row=>$columns)
          {
            $output = $output ."
            <tr>
              <td>".$columns["test_0"]. "</td>
              <td>" .$columns["test_1"]. "</td>
              <td>" .$columns["test_2"]. "</td>
              <td>" .$columns["test_3"]. "</td>
              <td>" .$columns["test_4"]. "</td>
              <td>" .$columns["test_5"]. "</td>
            </tr>";
          }
    $approach_1_result = parsemicrotime()-$time_start;


    /////////////second test
    $time_start2 = parsemicrotime();
        $output2 = '';
        foreach($a as $row2=>$columns2)
          {
            $now2= "
            <tr>
              <td>".$columns["test_0"]. "</td>
              <td>" .$columns["test_1"]. "</td>
              <td>" .$columns["test_2"]. "</td>
              <td>" .$columns["test_3"]. "</td>
              <td>" .$columns["test_4"]. "</td>
              <td>" .$columns["test_5"]. "</td>
            </tr>";
            $output2 = $output2 .$now2;
          }
    $approach_2_result = parsemicrotime()-$time_start2;


    /////////////third test
    $time_start3 = parsemicrotime();
    ob_start();
        $output3 = '';
        foreach($a as $row3=>$columns3)
          {
            echo "
            <tr>
              <td>".$columns["test_0"]. "</td>
              <td>" .$columns["test_1"]. "</td>
              <td>" .$columns["test_2"]. "</td>
              <td>" .$columns["test_3"]. "</td>
              <td>" .$columns["test_4"]. "</td>
              <td>" .$columns["test_5"]. "</td>
            </tr>";
          }
    $output3 = ob_get_clean();
    $approach_3_result = parsemicrotime()-$time_start3;


    die("first test:".$approach_1_result."<br>second test:".$approach_2_result."<br>third test:".$approach_3_result);
7
  • 1
    Is this something you experienced more than once or just a one time situation? Like, CPU busy for something else, slow network, or whatever Commented Oct 13, 2011 at 13:49
  • Because you actually did metric something else. Commented Oct 13, 2011 at 13:49
  • Reproduce the problem minimally. Commented Oct 13, 2011 at 13:53
  • How many iterations are there? However: sounds strange, anyway ... Commented Oct 13, 2011 at 14:01
  • yes, it is two dimensional array, time measured right before the foreach and right after that, tried several times with both aproaches, results +- the same all the time..using echo is even quicker, but the difference between echo and $return=$return.$new; is insignificant (and adding ob_start() etc might even slow a little bit down) Commented Oct 13, 2011 at 14:39

2 Answers 2

2

I did a few similar experiments using a generated array:

$a = [];
for($i = 0; $i < 10000; $i ++)
  $a[] = $i;

To time them I simply store the microtime before the execution and subtract it from the microtime after the execution. I executed the code 10 times and took the average.

First I tried something similar to your first approach:

$output = '';
foreach($a as $k => $v)
  $output = $output . "some static text" . $v . "some other text";

This recorded an insane time of ~3s! I then tried the same code using single quotes and got the same result.

I then changed the concatenation line to:

$output .= 'some static text' . $v . 'some other text';

This resulted in a time of ~0.007s, ~429 times faster!

Finally I changed the code to:

$output = '';
ob_start();
foreach($a as $k => $v)
  echo 'some static text' . $v . 'some other text';
$output = ob_get_clean();

And scored marginally slower than the .= approach (still ~0.007).

Disclaimer: everything that follows are just my intuitions on why the times are what they are.

Now, I'm not an expert on the PHP internals but I'd guess that the reason the first method is so much slower is because it has to create a new string and copy the old one (which is slowly making its way to the final size of ~350,000 characters) 10,000 times, and copying is typically a fairly inefficient operation. The .= approach, however, simply extends the original string, avoiding the copy operations. The buffered approach is similar probably because writing to the output stream is similar in cost to extending a variable, with the ob_start and ob_get_clean adding the marginal overhead.

Sign up to request clarification or add additional context in comments.

2 Comments

I'm not an expert either, but i had a look at the implementation of PHP 5.3. It looks to me, that both operators result in the same call: case ZEND_CONCAT: case ZEND_ASSIGN_CONCAT: return (binary_op_type) concat_function; This function can either do a emalloc, or a erealloc if result and first parameter are the same. That leads to my question: with which PHP version did you make your test, and could you try to fill your array with much longer textes? This would prohibit, that a memory block can be reused in erealloc.
This is with 5.4beta1 (I can't resist the short array syntax). I'll rerun the tests with longer texts later today and post the results.
0

There is a difference between the two approaches, but it's astonishing, that is should make such a big difference.

Every time you concatenate two strings, PHP has to allocate a new memory block, big enough for the new string. For really large strings it has to find an even larger block of contiguous memory. Because every time it has to be a bit larger, it cannot reuse the former blocks (they are to small). So if your string grows, it can become slow to find another memory block and copy the string.

  1. In your first example you have a lot of . operations in a single loop. With every . the already large string becomes larger.

  2. In your second example you collect all .operations of the loop. The variable $now will be relatively small and therefore these concatenations are fast. Only once per loop you need to find a big memory block.

As already mentioned, i'm a bit surprised, that it should make such a big difference, but depending on the number of iterations it could be possible.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.