0

I have a log file of about 200,000 rows . Each row's format is :

AAA||BBB|C|DDD||

Now I parse the values using the following parsing loop :

$fh = fopen($filename, 'r');
if($fh === FALSE) {
  return null;
}
$result = array();
while(!feof($fh)) {
  $line = fgets($fh);
  $tokens = explode('||', $line);
  $a = $tokens[0];
  list($b, $c, $d) = explode('|', $tokens[1]);
  // then I can get the values of AAA , BBB , C and DDD and put it into an array
  $result[$a] = array('a' => $a, 'b' => $b, 'c' => $c, 'd' => $d);
}

$result[$a] contains all I need, however the parsing time is ~2.1 second. What can I do to reduce parsing speed ?

4
  • Benchmark and profile using tools designed for the task, like xdebug. This will tell you where the bottleneck is. Commented May 2, 2013 at 6:23
  • I did. fgets() is the bottleneck. Commented May 2, 2013 at 6:25
  • Use fgetcsv() to read and explode in one go Commented May 2, 2013 at 6:27
  • Disadvantage of fgetcsv() is, I can't check for errors row by row, and ignore / report the problematic line. Once a malformed line is detected, fgetcsv() will fail. Commented May 2, 2013 at 6:33

3 Answers 3

4

Thanks for all the answers & comments . I did a benchmark of the following functions :

with the following codes ( omitted the while-loop ) :

// fgets ( same code in the question )
$tokens = explode('||', $line);
$a = $tokens[0];
list($b, $c, $d) = explode('|', $tokens[1]);
$result[$a] = array('a' => $a, 'b' => $b, 'c' => $c, 'd' => $d);

// fgetcsv
ini_set('auto_detect_line_endings',TRUE);
list($a, $nouse1, $b, $c, $d, $nouse2, $nouse3) = fgetcsv($fh, 200, '|');
$result[$a] = array('a' => $a, 'b' => $b, 'c' => $c, 'd' => $d);

// stream_get_line
$line = stream_get_line($fh, 200, PHP_EOL);
$tokens = explode('||', $line);
if(count($tokens) != 3) {
  continue;
}
$a = $tokens[0];
list($b, $c, $d) = explode('|', $tokens[1]);
$result[$a] = array('a' => $a, 'b' => $b, 'c' => $c, 'd' => $d);

// stream_get_line + str_getcsv
$line = stream_get_line($fh, 200, PHP_EOL);
list($a, $nouse1, $b, $c, $d, $nouse2, $nouse3) = str_getcsv($line, '|');
$result[$a] = array('a' => $a, 'b' => $b, 'c' => $c, 'd' => $d);

// fgets + str_getcsv
$line = fgets($fh);
list($a, $nouse1, $b, $c, $d, $nouse2, $nouse3) = str_getcsv($line, '|');
$result[$a] = array('a' => $a, 'b' => $b, 'c' => $c, 'd' => $d);

They parse the same text file in same path in same testing machine. The line format is :

AAA||BBB|C|DDD||

Here is the result (tested 3 times and take average timings) :

Unexpectedly, fgetcsv() is the slowest. But why ?

sidenote: stream_get_line() only available in PHP 5.

Sign up to request clarification or add additional context in comments.

Comments

1

File parsing in PHP is slow. I did some benchmarking between fgetcsv and a custom csv function a while ago, and fgetcsv was a clear winner (By a factor of around 10 I think). You should be able to rearrange your code to use fgetcsv, using '|' as your delimiter.

2 Comments

Disadvantage of fgetcsv() is, I can't check for errors row by row, and ignore / report the problematic line. Once a malformed line is detected, fgetcsv() will fail.
Not sure if this will help, but you could use fgets() to read the line, str_getcsv() to parse it, and use your own function to parse only error rows. str_getcsv() should still parse the line much faster thatn a custom php function.
0

Hm, I'm not sure how much this will really help, but what about exploding on | and assigning $a = $tokens[0], $b = $tokens[2], etc. You'll reduce your explode calls per iteration by one.

You also could achieve a similar thing by using fgetcsv with '|' as your delimiter. Again, not sure how much that's really going to improve things.

1 Comment

Interesting. I was skeptical that you'd get much improvement but I definitely wouldn't have expected it to be worse. It must have to do with parsing for the other 'rules' of a csv file, ie, may or may not be wrapped in ' or ". Things like that.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.