2

Hi I'm trying to import CSV data into a spatially enabled Postgres database. The data is available here. I'm unsure where I went wrong and any help is greatly appreciated! What I'm attempting to do is visualize that data with D3.js and maybe display a heat density of the most libraries per town or something to that degree.

File: lib/tasks/import_incidents_csv.rake

require 'csv'

namespace :import_incidents_csv do

  task :create_incidents => :environment do

    csv_text = File.read('/home/mgmacri/data/PublicLibraryBranchLocations.csv')
    csv = CSV.parse(csv_text, :headers => true)

    csv.each do |row|
      row = row.to_hash.with_indifferent_access
      Moulding.create!(row.to_hash.symbolize_keys)
    end

  end

end


user@server:/spatial_project$: rake import_incidents_csv:create_incidents --trace
** Invoke import_incidents_csv:create_incidents (first_time)
** Invoke environment (first_time)
** Execute environment
** Execute import_incidents_csv:create_incidents
rake aborted!
invalid byte sequence in UTF-8
/usr/lib/ruby/1.9.1/csv.rb:1855:in `sub!'
/usr/lib/ruby/1.9.1/csv.rb:1855:in `block in shift'
/usr/lib/ruby/1.9.1/csv.rb:1849:in `loop'
/usr/lib/ruby/1.9.1/csv.rb:1849:in `shift'
/usr/lib/ruby/1.9.1/csv.rb:1791:in `each'
/usr/lib/ruby/1.9.1/csv.rb:1805:in `to_a'
/usr/lib/ruby/1.9.1/csv.rb:1805:in `read'
/usr/lib/ruby/1.9.1/csv.rb:1379:in `parse'
/home/mgmacri/rails/mymap/lib/tasks/import_incidents_csv.rake:8:in `block (2 levels) in                                     
/var/lib/gems/1.9.1/gems/rake-10.0.3/lib/rake/task.rb:228:in `call'
/var/lib/gems/1.9.1/gems/rake-10.0.3/lib/rake/task.rb:228:in `block in execute'
/var/lib/gems/1.9.1/gems/rake-10.0.3/lib/rake/task.rb:223:in `each'
/var/lib/gems/1.9.1/gems/rake-10.0.3/lib/rake/task.rb:223:in `execute'
/var/lib/gems/1.9.1/gems/rake-10.0.3/lib/rake/task.rb:166:in `block in         invoke_with_call_chain'
/usr/lib/ruby/1.9.1/monitor.rb:211:in `mon_synchronize'
/var/lib/gems/1.9.1/gems/rake-10.0.3/lib/rake/task.rb:159:in `invoke_with_call_chain'
/var/lib/gems/1.9.1/gems/rake-10.0.3/lib/rake/task.rb:152:in `invoke'
/var/lib/gems/1.9.1/gems/rake-10.0.3/lib/rake/application.rb:143:in `invoke_task'
/var/lib/gems/1.9.1/gems/rake-10.0.3/lib/rake/application.rb:101:in `block (2 levels)     in top_level'
/var/lib/gems/1.9.1/gems/rake-10.0.3/lib/rake/application.rb:101:in `each'
/var/lib/gems/1.9.1/gems/rake-10.0.3/lib/rake/application.rb:101:in `block in   top_level'
/var/lib/gems/1.9.1/gems/rake-10.0.3/lib/rake/application.rb:110:in `run_with_threads'
/var/lib/gems/1.9.1/gems/rake-10.0.3/lib/rake/application.rb:95:in `top_level'
/var/lib/gems/1.9.1/gems/rake-10.0.3/lib/rake/application.rb:73:in `block in run'
/var/lib/gems/1.9.1/gems/rake-10.0.3/lib/rake/application.rb:160:in  `standard_exception_handling'
/var/lib/gems/1.9.1/gems/rake-10.0.3/lib/rake/application.rb:70:in `run'
/var/lib/gems/1.9.1/gems/rake-10.0.3/bin/rake:33:in `<top (required)>'
/usr/local/bin/rake:19:in `load'
/usr/local/bin/rake:19:in `<main>'
Tasks: TOP => import_incidents_csv:create_incidents

2 Answers 2

3

Excel encodes the file into ISO-8859-1 and not in UTF-8. So tell Ruby to open the file with read only in ISO-8859-1

file=File.open("input_file", "r:ISO-8859-1")
Sign up to request clarification or add additional context in comments.

2 Comments

Excel uses an ANSI codepage by default, not an ISO charset. These aren't quite the same thing. For example, ISO-8859-1 is similar but not identical to cp1252 (otherwise known as Windows-1252). It's best to use the correct codepage rather than guess at a close-enough encoding - or better yet, save your Excel sheets in UTF-8 using OpenOffice and retain your sanity. Note that Excel will use a different codepage on different systems; eg a central european users will probably send you cp1251 text. See stackoverflow.com/questions/508558/…
See also en.wikipedia.org/wiki/Windows-1252 which explains among other things that Windows-1252 isn't even ANSI-standard even though Windows calls it an "ANSI" codepage.
2

Using postgresql's native CSV import is a couple of orders of magnitude faster than using Ruby's CSV API, and may also avoid the same encoding issues.

For example:

namespace :import_incidents_csv do
  task :create_incidents => :environment do
    ActiveRecord::Base.connection.execute "COPY moulding (name, state, postcode, lat, long) FROM '/home/mgmacri/data/PublicLibraryBranchLocations.csv' DELIMITER ',' CSV;"
  end
end

More info: http://www.postgresql.org/docs/9.2/static/sql-copy.html

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.