optimizing reading database and writing to csv file

Question

I'm trying to read a large amount of cells from database (over 100.000) and write them to a csv file on VPS Ubuntu server. It happens that server doesn't have enough memory.

I was thinking about reading 5000 rows at once and writing them to file, then reading another 5000, etc..

How should I restructure my current code so that memory won't be consumed fully?

Here's my code:

def write_rows(emails)

  File.open(file_path, "w+") do |f|
    f << "email,name,ip,created\n"
    emails.each do |l|
      f << [l.email, l.name, l.ip, l.created_at].join(",") + "\n"
    end
  end
end

The function is called from sidekiq worker by:

write_rows(user.emails)

Thanks for help!

toro2k · Accepted Answer · 2014-01-16 12:28:35Z

4

The problem here is that when you call emails.each ActiveRecord loads all the records from the database and keeps them in memory, to avoid this you can use the method find_each:

require 'csv'

BATCH_SIZE = 5000

def write_rows(emails)
  CSV.open(file_path, 'w') do |csv|

    csv << %w{email name ip created}

    emails.find_each do |email|
      csv << [email.email, email.name, email.ip, email.created_at]
    end
  end
end

By default find_each loads records in batches of 1000 at a time, if you want to load batches of 5000 record you have to pass the option :batch_size to find_each:

emails.find_each(:batch_size => 5000) do |email|
  ...

More information about the find_each method (and the related find_in_batches) can be found on the Ruby on Rails Guides.

I've used the CSV class to write the file instead of joining fields and lines by hand. This is not inteded to be a performance optimization since writing on the file shouldn't be the bottleneck here.

edited Jan 16, 2014 at 12:28

answered Jan 16, 2014 at 11:54

toro2k

19.3k8 gold badges66 silver badges72 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

all jazz Over a year ago

thanks.. what about writing to acsv file? will csv gem optimize writing to the file?

toro2k Over a year ago

@Aljaz Not really, I've used CSV just to avoid joining fields/lines. csv is not a gem, it comes from the Ruby stdlib.

Jordan Running Over a year ago

@Aljaz The CSV module will, however, ensure that all of your values are escaped correctly. If there's any chance any of the values from your database will have a comma or newline in them (i.e. if you accept user input and don't have strict validations to reject those characters), you should use the CSV module instead of doing this "manually." Honestly 100,000 rows is not very many and the CSV module (which, since 1.9.3. is based on FasterCSV) will do this very quickly.

ryaz Over a year ago

find_each didn't help in my case. with 700k records it used 1GB of memory for some reason.

Collectives™ on Stack Overflow

optimizing reading database and writing to csv file

1 Answer 1

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related