Using grep
Edit: Having my big file around I also tested Uri Agassi's aproach using grep to get the lines of the file with empty fields:
File.new(filename).grep(/(^,|,(,|$))/)
It's about 10 times faster. If you need access to the fields you can use CSV.parse:
require 'csv'
File.new("/tmp/big.csv").grep(/(^,|,(,|%))/).each do |row_string|
CSV.parse(row_string) do |row|
puts row[1]
end
end
Using a native CSV parser
Otherwise, if you have to parse the whole CSV file anyway, the answer is most likely no. Try running your script without the checking part - just reading the CSV rows. You will see no change in running time. This is because most of the time is spent reading and parsing the CSV file.
You might wonder if there is a faster CSV library for ruby. There is indeed a gem called FasterCSV but Ruby 1.9 has adopted it as its built-in CSV library, so it probably won't get much faster using Ruby only.
There is a ruby gem named excelsior which uses a native CSV parser. You can install it via gem install excelsior and use it like this:
require 'excelsior'
Excelsior::Reader.rows(File.open('/tmp/big.csv')) do |row|
row.each do |column|
unless column
puts "empty field"
end
end
end
I tested this code with a file like yours (72M, ~30k entries à 2.5k fields) and it is about twice as fast, however it segfaults after a few lines, so the gem might not be stable.
Using CSV
As you mentioned in your comment, there are a few more idiomatic ways to write this, such as using each instead of the for loop or using unless instead of if !, and using two spaces for indentation, which will turn it into:
require 'csv'
CSV.foreach('/tmp/big.csv') do |row|
row.each do |column|
unless column
puts "empty field"
end
end
end
This won't improve the speed though.
forbeing used in Ruby code. Most of the time people dorow.each do |column|instead. Can you define "quite slow"? This could be a 90GB file you're processing here. You haven't given any context.row.each do |column|andif column[1]and now it takes about 10 secs per CSV file (30-40 MB, ~32000 rows, 577 columns). Maybe there was something else messed up. I can live with that for my purposes, however, if somebody knows something faster than this, I still appreciate.if columnat that point since it's expandingrowinto a series of independentcolumnentries.column[1]refers to the 2nd character of the column string. As for speed, CSV decoding isn't always blazingly fast, especially on larger files. The CSV module in 1.9.3 is better, but you might want to try Ruby 2.1 and see if that's even faster, which it should be.headers: truemode it might need to be declared asrow.each do |header, column|to extract those values.