If it my understanding that we are given an array dictionary of words containing no whitespace, and a string str, and are to construct a hash whose keys are elements of dictionary and whose values equal the number of non-overlapping1 substrings of str for which the key is a substring. The hash returned should exclude keys having values of zero.
This answer addresses the situation where, in:
substrings(str, dictionary)
dictionary is large, str is not overly-large (the meaning of which I elaborate later) and efficiency is important.
We begin by defining a helper method, whose purpose will become clear.
def substr_counts(str)
str.split.each_with_object(Hash.new(0)) do |word,h|
(1..word.size).each do |sub_len|
(0..word.size-sub_len).each do |start_idx|
h[word[start_idx,sub_len]] += 1
end
end
end
end
For the example given in the question,
substr_counts("go going")
#=> {"g"=>3, "o"=>2, "go"=>2, "i"=>1, "n"=>1, "oi"=>1, "in"=>1, "ng"=>1,
# "goi"=>1, "oin"=>1, "ing"=>1, "goin"=>1, "oing"=>1, "going"=>1}
As seen, this method breaks str into words, computes each substring of each word and returns a hash whose keys are the substrings and whose values are the total numbers of non-overlapping substrings in all words that contain that substring.
The desired hash can now be constructed quickly.
def cover_count(str, dictionary)
h = substr_counts(str)
dictionary.each_with_object({}) do |word,g|
g[word] = h[word] if h.key?(word)
end
end
dictionary = ["below", "down", "go", "going", "horn", "how", "howdy",
"it", "i", "low", "own", "part", "partner", "sit"]
cover_count("go going", dictionary)
#=> {"go"=>2, "going"=>1, "i"=>1}
Another example:
str = "lowner partnership lownliest"
cover_count(str, dictionary)
#=> {"i"=>2, "low"=>2, "own"=>2, "part"=>1, "partner"=>1}
Here,
substr_counts(str)
#=> {"l"=>3, "o"=>2, "w"=>2, "n"=>3, "e"=>3, "r"=>3, "lo"=>2,
# ...
# "wnliest"=>1, "lownlies"=>1, "ownliest"=>1, "lownliest"=>1}
substr_counts(str).size
#=> 109
There is an obvious trade-off here. If str is long, and especially if it contains long words2, it will take too long to build h to justify the savings of not having to determine, for each word in dictionary, if that word is contained in each word of str. If, however, it's worth it to construct h, the overall time savings could be substantial.
1. By "non-overlapping" I mean that if str equals 'bobobo' it contains one, not two, substrings 'bobo'.
2. substr_counts("antidisestablishmentarianism").size #=> 385, not so bad.
%w[ a b c ... ]which gives you the same result without all the extra quoting and commas.