1

I have an array of hashes built like this:

grapes_matched << { part: part, grape: grape_match }

I would like to:

  • keep the current array sorting
  • remove items in array if grape_match.name (Active Record item) includes another grape_match.name shorter.

For example, imagine my array of hashes is:

{ part:"toto", grape: AR Grape with name: "Cabernet" }
{ part:"titi", grape: AR Grape with name: "Cabernet Sauvignon" }
{ part:"tutu", grape: AR Grape with name: "Merlot" }

As the second "Cabernet Sauvignon" includes the first "Cabernet", I want to remove the first of the array.

If possible, I would like to not build another array and keep my array of hashes without changing the structure (not like the code below).

I have something very ugly at the time:

grapes_matched.each do |grape_matched|
            temp_grape = grape_matched[:grape]
            temp_grape_name = I18n.transliterate(temp_grape.name).downcase 
            # does the temp grape name is included in one of previous grapes
            # first grape
            grapes_founds << temp_grape if grapes_founds.length == 0
            # other grapes
            grapes_founds.each do |grape_found|
              grapes_founds << temp_grape if !I18n.transliterate(grape_found.name).downcase.include? temp_grape_name
            end
          end

I'm quite sure this can be done with less lines of code in Ruby and keeping the initial array of hashes.

Thanks in advance.

2 Answers 2

1

My objective is to implement a reasonably-efficient algorithm.

Let's first simplify and rearrange the array somewhat.

grapes = [{ part:"toto", grape: "Cabernet" },
          { part:"tutu", grape: "Merlot" },
          { part:"titi", grape: "Cabernet Sauvignon" }]

We can then do the following to obtain the desired array in a reasonably-efficient manner.

grapes.each_with_index.
       sort_by { |g,_i| -g[:grape].size }.
       each_with_object([]) { |(g,i),a| a << [g,i] unless a.any? { |f,_i|
         f[:grape].include?(g[:grape]) } }.
       sort_by(&:last).
       map(&:first)
  #=> [{:part=>"tutu", :grape=>"Merlot"},
  #    {:part=>"titi", :grape=>"Cabernet Sauvignon"}]

The steps are as follows.

Add an index to each hash so that their original order in grapes can be determined later.

e = grapes.each_with_index
  #=> #<Enumerator: [{:part=>"toto", :grape=>"Cabernet"},
  #                  {:part=>"tutu", :grape=>"Merlot"},
  #                  {:part=>"titi", :grape=>"Cabernet Sauvignon"}]:each_with_index>

Sort the hash/index pairs by decreasing size of g[:grape].

 b = e.sort_by { |g,_i| -g[:grape].size }
   #=> [[{:part=>"titi", :grape=>"Cabernet Sauvignon"}, 2],
   #    [{:part=>"toto", :grape=>"Cabernet"}, 0],
   #    [{:part=>"tutu", :grape=>"Merlot"}, 1]]

Add each hash/index pair [g,i] to an initially-empty array a, unless f[:grape] includes g[:grape] for a hash f already in a.

c = b.each_with_object([]) { |(g,i),a| a << [g,i] unless a.any? { |f,_i|
         f[:grape].include?(g[:grape]) } }
  #=> [[{:part=>"titi", :grape=>"Cabernet Sauvignon"}, 2],
  #    [{:part=>"tutu", :grape=>"Merlot"}, 1]]

To obtain the desired order of the hashes in c, sort them by their indexes in the original array grapes (which has no effect for this example).

d = c.sort_by(&:last)
  #=> [[{:part=>"tutu", :grape=>"Merlot"}, 1],
  #    [{:part=>"titi", :grape=>"Cabernet Sauvignon"}, 2]]

Remove the indices.

d.map(&:first)
  #=> [{:part=>"tutu", :grape=>"Merlot"},
  #    {:part=>"titi", :grape=>"Cabernet Sauvignon"}]

Depending on requirements, it may be preferable to replace f[:grape].include?(g[:grape]) with f[:grape].begin_with?(g[:grape]) || f[:grape].end_with?(g[:grape]).

A simple benchmark comparing @Max' solution with mine follows.

def max_way(grapes_matched)
  grapes_matched.select do |grape_matched|
    grapes_matched.none? { |gm| gm[:grape] != grape_matched[:grape] && 
      grape_matched[:grape].include?(gm[:grape]) }
  end
end

def cary_way(grapes)
  grapes.each_with_index.
         sort_by { |g,_i| -g[:grape].size }.
         each_with_object([]) { |(g,i),a| a << [g,i] unless a.any? { |f,_i|
           f[:grape].include?(g[:grape]) } }.
         sort_by(&:last).
         map(&:first)
end

ALPHA = ('a'..'z').to_a
def rnd5
  '     '.gsub(' ') { ALPHA.sample }
end

def grapes(n, m)
  n.times.each_with_object([]) do |i,a|
    s1, s2 = rnd5, rnd5
    a << { grape: "%s %s" % [s1, s2] }
    a << { grape: i.even? ? s1 : s2 } if i < m
  end.shuffle
end

require 'fruity'

def bench(n, m)
  (grapes_matched = grapes(n, m)).size
  compare do
    Max  { max_way(grapes_matched) }
    Cary { cary_way(grapes_matched) }
  end
end

bench   95, 5
Running each test once. Test will take about 1 second.
Cary is faster than Max by 3x ± 1.0

bench  950, 50
Running each test once. Test will take about 13 seconds.
Cary is faster than Max by 3x ± 1.0

bench  950, 500
Running each test once. Test will take about 23 seconds.
Cary is faster than Max by 4x ± 0.1
Sign up to request clarification or add additional context in comments.

5 Comments

Hello Cary, First, I didn't have the time to test Max answser that's why I didn't reply at the time. To your proposal: I want to keep the longest grape, not the shortest. Cabernet Sauvignon should be kept, and Cabernet removed. One more thing: I doesn't want to use start_with? but include? because in some other cases, the grape could be "toto tutu" and "tutu"... and I want to keep "toto tutu" that doesn't start with "tutu". ;)
Thanks Cary, seems to work. But the Max answer that is shorter too. ;)
alex, when I wrote my answer @Max had already submitted his (which I upvoted--you can too). I therefore concentrated on algorithmic efficiency.
alex, while I made changes to keep longer names, that is not what you seem to be asking in the question: "remove items in array if grape_match.name includes another grape_match.name shorter." I also noticed that @Max's solution keeps the shorter names. I don't think that has much effect on my benchmark, however.
Hello Cary, First, thanks for all your efforts. I tested and adjust your solution to (g[:grape].name) as g[:grape] is an ActiveRecord object. I didn't make speed tests as you, but I trust you. ;-) So I will accept your solution because this code will be run in a cron with thousands of objects.
1

It can be much shorter:

grapes_founds = grapes_matched.select do |grape_matched|
  grapes_matched.none? { |gm| gm[:grape] != grape_matched[:grape] && grape_matched[:grape].include?(gm[:grape]) }
end

In English: select all grapes for which no other grape has a different name which is included in this grape's name.

It's not entirely clear to me what your data structure is and how your strings are normalized so you might have to massage this into exactly the right form.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.