I have a CSV with a number of filenames and dates:
"doc_1.doc", "date1"
"doc_2.doc", "date2"
"doc_5.doc", "date5"
The issue is that there are many gaps in between file numbers, e.g.: doc_2 and doc_5
I am trying to write a script that parses the CSV and fills in the gaps by comparing each row and filling in the gaps where necessary.
e.g. in this example, it would add
"doc_3.doc", "date copied from date2"
"doc_4.doc", "date copied from date2"
I'm trying to write this script in Ruby since I'm trying to learn the language and clearly I am misunderstanding the way Ruby's looping works because it's not the typical 'for' loops one uses often in PHP etc.
Here is my code so far, any help with the loop itself would be greatly appreciated!
#!/usr/bin/env ruby
require 'csv'
# Load file
csv_fname = './upload-list-docs.csv'
# Parsing function
def parse_csv(csv_fname)
uploads = []
last_number = 0
# Regex to find number in doc_XXX.YYY
regex_find_number = /(?<=\_)(.*?)(?=\.)/
csv_content = CSV.read(csv_fname)
# Skip header row
csv_content.shift
csv_content.each do |row|
current_number = row[0].match regex_find_number
current_date = row[1]
last_date = current_date
until last_number == current_number do
uploads << [last_number, last_date]
last_number += 1
end
end
return uploads
end
puts parse_csv(csv_fname)
And some sample CSV
"file_name","date"
"doc_1.jpg","2011-05-11 09:16:05.000000000"
"doc_3.doc","2011-05-11 10:10:36.000000000"
"doc_4.doc","2011-05-11 10:17:19.000000000"
"doc_6.doc","2011-05-11 10:58:35.000000000"
"doc_7.pdf","2011-05-11 11:16:22.000000000"
"doc_8.pdf","2011-05-11 11:19:29.000000000"
"doc_9.docx","2011-05-11 11:40:03.000000000"
"doc_13.pdf","2011-05-11 12:26:32.000000000"
"doc_14.docx","2011-05-11 12:34:50.000000000"
"doc_15.doc","2011-05-11 12:40:12.000000000"
"doc_16.doc","2011-05-11 13:03:11.000000000"
"doc_17.doc","2011-05-11 13:03:58.000000000"
"doc_19.pdf","2011-05-11 13:25:07.000000000"
"doc_20.rtf","2011-05-11 13:34:26.000000000"
"doc_21.rtf","2011-05-11 13:35:25.000000000"
"doc_24.doc","2011-05-11 13:49:02.000000000"
"doc_25.doc","2011-05-11 14:05:04.000000000"
"doc_26.pdf","2011-05-11 14:18:26.000000000"
"doc_27.rtf","2011-05-11 14:30:19.000000000"
"doc_28.doc","2011-05-11 14:33:13.000000000"
"doc_29.jpg","2011-05-11 15:07:27.000000000"
"doc_30.doc","2011-05-11 15:22:30.000000000"
"doc_31.doc","2011-05-11 15:31:07.000000000"
"doc_34.doc","2011-05-11 15:51:56.000000000"
"doc_35.doc","2011-05-11 15:55:15.000000000"
"doc_36.doc","2011-05-11 16:06:46.000000000"
"doc_38.wps","2011-05-11 16:21:08.000000000"
"doc_39.doc","2011-05-11 16:30:57.000000000"
"doc_40.doc","2011-05-11 16:41:55.000000000"
"doc_43.JPG","2011-05-11 17:03:40.000000000"
"doc_46.doc","2011-05-11 17:28:13.000000000"
"doc_51.doc","2011-05-11 17:50:34.000000000"
"doc_52.doc","2011-05-11 18:03:13.000000000"
"doc_53.doc","2011-05-11 18:43:48.000000000"
"doc_54.doc","2011-05-11 18:54:45.000000000"
"doc_55.doc","2011-05-11 19:31:03.000000000"
"doc_56.doc","2011-05-11 19:31:23.000000000"
"doc_57.doc","2011-05-11 20:17:38.000000000"
"doc_59.jpg","2011-05-11 20:22:55.000000000"
"doc_61.pdf","2011-05-11 21:14:52.000000000"
until last_number >= current_number do... that will give you a clue.current_numberdoesn't have to change sincelast_numberis changing. As long as one of them is changing (and getting closer to the termination condition)