3

I'm working on a method to take a CSV file (with headers) and parse it using Ruby CSV.parse, however I only want to save specific columns.

The CSV looks like this:

NAME,SUPERNET_IP,POP_NAME,ADDRESS_BLOCK_START,ADDRESS_BLOCK_END,Service,ISP Service ID,WCC,DUNSID
Retail,186.43.168.0,text1,186.43.168.0,186.43.175.255,XYZ,XYZB00090095,Enabled,227015716
Retail,186.57.80.0,text2,186.57.80.0,186.57.87.255,XYZ,XYXB00090095,Enabled,227015716

and the only fields I want to keep are:

POP_NAME,ADDRESS_BLOCK_START,ADDRESS_BLOCK_END,WCC

Is there a way to parse in the CSV by specific header names, like:

mycsv = CSV.parse(csv_data, {:headers => true, (list of headers to keep here) })

This example is assuming csv_data is a string formed from the example CSV above.

As a stopgap I'm just converting the CSV into an array of arrays but it's not really what I'm after. I'd rather keep it as a CSV object.

myreturnedcsv = []
mycsv = CSV.parse(csv_data, {:headers => true, })
mycsv.each do |row|
  myreturnedcsv.push([row[2], row[3], row[4],row[7]])
end
1

3 Answers 3

6

Please try the smarter_csv gem / parser. This has ability to ignore "columns" in the input (delete columns) https://github.com/tilo/smarter_csv

Sign up to request clarification or add additional context in comments.

3 Comments

this seems like it'll work any idea if I can validate header values using it?
Use SmarterCSV.process with option :key_mapping. Documentation says that -> a hash which maps headers from the CSV file to keys in the result hash. Let me know if it works for you.
just map any columns you don't want to nil in the :key_mapping hash
3

Using only stdlib, you can operate with CSV::Table objects in column mode (instead of mixed mode which is the default). In column mode, iteration methods will yield two element tuples containing the column name and an Array of values for that column.

With that in mind, we could write something like:

# column names to keep
columns_to_keep = %w(POP_NAME ADDRESS_BLOCK_START ADDRESS_BLOCK_END WCC)

# get the data
mycsv = CSV.parse(csv_data, :headers => true)

# change to column mode, filter by column name and change back to default
# mode of operation
mycsv.by_col!.delete_if do |col_name, col_values|
  !columns_to_keep.include?(col_name)
end.by_col_or_row!   

Last step is optional and is only to leave the table object in the default mode in which we can iterate as usual (by row).

I don't actually know if this approach may suffer from perf/mem issues when dealing with large data sets.

You can find more information about row/col/mixed access in the docs for CSV::Table.

Hope it helps.

Comments

0

I'd go about it doing some array slicing:

require 'csv'

csv_data = <<EOT
NAME,SUPERNET_IP,POP_NAME,ADDRESS_BLOCK_START,ADDRESS_BLOCK_END,Service,ISP Service ID,WCC,DUNSID
Retail,186.43.168.0,text1,186.43.168.0,186.43.175.255,XYZ,XYZB00090095,Enabled,227015716
Retail,186.57.80.0,text2,186.57.80.0,186.57.87.255,XYZ,XYXB00090095,Enabled,227015716
EOT

data = []
CSV.parse(csv_data) do |row|
  data << [ *row[2 .. 4], row[-2] ]
end

require 'pp'
pp data

Which returns:

[["POP_NAME", "ADDRESS_BLOCK_START", "ADDRESS_BLOCK_END", "WCC"],
["text1", "186.43.168.0", "186.43.175.255", "Enabled"],
["text2", "186.57.80.0", "186.57.87.255", "Enabled"]]

I let CSV return the headers, to make it easier to see what the code is doing. Turn the headers off using the normal new option.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.