Sorry if this has already been asked.
- I have about 1 million text documents contained in psql
- I am trying to see if they contain certain word, for example cancer, or died or heart_attack etc. This list is also quite long.
- The document only needs to contain one of the words.
- If they contain a word, I then try to copy them to a different folder.
My current code is:
directory = "disease" #Creates a directory called heart attacks
FileUtils.mkpath(directory) # Makes the directory if it doesn't exists
cancer = Eightk.where("text ilike '%cancer%'")
died = Eightk.where("text ilike '%died%'")
cancer.each do |filing| #filing can be used instead of eightks
filename = "#{directory}/#{filing.doc_id}.html"
File.open(filename,"w").puts filing.text
puts "Storing #{filing.doc_id}..."
died.each do |filing| #filing can be used instead of eightks
filename = "#{directory}/#{filing.doc_id}.html"
File.open(filename,"w").puts filing.text
puts "Storing #{filing.doc_id}..."
end
end
But this is not working for the following
Doesn't match the exact word
Is very time consuming since it contains lots of coping the same code and changing just one word.
So I have tried using Regexp.union as follows but am a bit lost
directory = "disease" #Creates a directory called heart attacks
FileUtils.mkpath(directory) # Makes the directory if it doesn't exists
keywords = [/dead/,/killed/,/cancer/]
re = regexp.union(keywords)
So I am trying to search the text files for these keywords and then copy the text documents.
Any help is really appreciated.