Browse array of hashes to remove string included in other string in Ruby

Question

I have an array of hashes built like this:

grapes_matched << { part: part, grape: grape_match }

I would like to:

keep the current array sorting
remove items in array if grape_match.name (Active Record item) includes another grape_match.name shorter.

For example, imagine my array of hashes is:

{ part:"toto", grape: AR Grape with name: "Cabernet" }
{ part:"titi", grape: AR Grape with name: "Cabernet Sauvignon" }
{ part:"tutu", grape: AR Grape with name: "Merlot" }

As the second "Cabernet Sauvignon" includes the first "Cabernet", I want to remove the first of the array.

If possible, I would like to not build another array and keep my array of hashes without changing the structure (not like the code below).

I have something very ugly at the time:

grapes_matched.each do |grape_matched|
            temp_grape = grape_matched[:grape]
            temp_grape_name = I18n.transliterate(temp_grape.name).downcase 
            # does the temp grape name is included in one of previous grapes
            # first grape
            grapes_founds << temp_grape if grapes_founds.length == 0
            # other grapes
            grapes_founds.each do |grape_found|
              grapes_founds << temp_grape if !I18n.transliterate(grape_found.name).downcase.include? temp_grape_name
            end
          end

I'm quite sure this can be done with less lines of code in Ruby and keeping the initial array of hashes.

Thanks in advance.

Cary Swoveland · Accepted Answer · 2017-12-19 05:41:29Z

1

My objective is to implement a reasonably-efficient algorithm.

Let's first simplify and rearrange the array somewhat.

grapes = [{ part:"toto", grape: "Cabernet" },
          { part:"tutu", grape: "Merlot" },
          { part:"titi", grape: "Cabernet Sauvignon" }]

We can then do the following to obtain the desired array in a reasonably-efficient manner.

grapes.each_with_index.
       sort_by { |g,_i| -g[:grape].size }.
       each_with_object([]) { |(g,i),a| a << [g,i] unless a.any? { |f,_i|
         f[:grape].include?(g[:grape]) } }.
       sort_by(&:last).
       map(&:first)
  #=> [{:part=>"tutu", :grape=>"Merlot"},
  #    {:part=>"titi", :grape=>"Cabernet Sauvignon"}]

The steps are as follows.

Add an index to each hash so that their original order in grapes can be determined later.

e = grapes.each_with_index
  #=> #<Enumerator: [{:part=>"toto", :grape=>"Cabernet"},
  #                  {:part=>"tutu", :grape=>"Merlot"},
  #                  {:part=>"titi", :grape=>"Cabernet Sauvignon"}]:each_with_index>

Sort the hash/index pairs by decreasing size of g[:grape].

 b = e.sort_by { |g,_i| -g[:grape].size }
   #=> [[{:part=>"titi", :grape=>"Cabernet Sauvignon"}, 2],
   #    [{:part=>"toto", :grape=>"Cabernet"}, 0],
   #    [{:part=>"tutu", :grape=>"Merlot"}, 1]]

Add each hash/index pair [g,i] to an initially-empty array a, unless f[:grape] includes g[:grape] for a hash f already in a.

c = b.each_with_object([]) { |(g,i),a| a << [g,i] unless a.any? { |f,_i|
         f[:grape].include?(g[:grape]) } }
  #=> [[{:part=>"titi", :grape=>"Cabernet Sauvignon"}, 2],
  #    [{:part=>"tutu", :grape=>"Merlot"}, 1]]

To obtain the desired order of the hashes in c, sort them by their indexes in the original array grapes (which has no effect for this example).

d = c.sort_by(&:last)
  #=> [[{:part=>"tutu", :grape=>"Merlot"}, 1],
  #    [{:part=>"titi", :grape=>"Cabernet Sauvignon"}, 2]]

Remove the indices.

d.map(&:first)
  #=> [{:part=>"tutu", :grape=>"Merlot"},
  #    {:part=>"titi", :grape=>"Cabernet Sauvignon"}]

Depending on requirements, it may be preferable to replace f[:grape].include?(g[:grape]) with f[:grape].begin_with?(g[:grape]) || f[:grape].end_with?(g[:grape]).

A simple benchmark comparing @Max' solution with mine follows.

def max_way(grapes_matched)
  grapes_matched.select do |grape_matched|
    grapes_matched.none? { |gm| gm[:grape] != grape_matched[:grape] && 
      grape_matched[:grape].include?(gm[:grape]) }
  end
end

def cary_way(grapes)
  grapes.each_with_index.
         sort_by { |g,_i| -g[:grape].size }.
         each_with_object([]) { |(g,i),a| a << [g,i] unless a.any? { |f,_i|
           f[:grape].include?(g[:grape]) } }.
         sort_by(&:last).
         map(&:first)
end

ALPHA = ('a'..'z').to_a
def rnd5
  '     '.gsub(' ') { ALPHA.sample }
end

def grapes(n, m)
  n.times.each_with_object([]) do |i,a|
    s1, s2 = rnd5, rnd5
    a << { grape: "%s %s" % [s1, s2] }
    a << { grape: i.even? ? s1 : s2 } if i < m
  end.shuffle
end

require 'fruity'

def bench(n, m)
  (grapes_matched = grapes(n, m)).size
  compare do
    Max  { max_way(grapes_matched) }
    Cary { cary_way(grapes_matched) }
  end
end

bench   95, 5
Running each test once. Test will take about 1 second.
Cary is faster than Max by 3x ± 1.0

bench  950, 50
Running each test once. Test will take about 13 seconds.
Cary is faster than Max by 3x ± 1.0

bench  950, 500
Running each test once. Test will take about 23 seconds.
Cary is faster than Max by 4x ± 0.1

edited Dec 19, 2017 at 5:41

answered Dec 16, 2017 at 7:40

Cary Swoveland

111k6 gold badges69 silver badges105 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

alex.bour Over a year ago

Hello Cary, First, I didn't have the time to test Max answser that's why I didn't reply at the time. To your proposal: I want to keep the longest grape, not the shortest. Cabernet Sauvignon should be kept, and Cabernet removed. One more thing: I doesn't want to use start_with? but include? because in some other cases, the grape could be "toto tutu" and "tutu"... and I want to keep "toto tutu" that doesn't start with "tutu". ;)

alex.bour Over a year ago

Thanks Cary, seems to work. But the Max answer that is shorter too. ;)

Cary Swoveland Over a year ago

alex, when I wrote my answer @Max had already submitted his (which I upvoted--you can too). I therefore concentrated on algorithmic efficiency.

Cary Swoveland Over a year ago

alex, while I made changes to keep longer names, that is not what you seem to be asking in the question: "remove items in array if grape_match.name includes another grape_match.name shorter." I also noticed that @Max's solution keeps the shorter names. I don't think that has much effect on my benchmark, however.

alex.bour Over a year ago

Hello Cary, First, thanks for all your efforts. I tested and adjust your solution to (g[:grape].name) as g[:grape] is an ActiveRecord object. I didn't make speed tests as you, but I trust you. ;-) So I will accept your solution because this code will be run in a cron with thousands of objects.

Max · Accepted Answer · 2017-12-15 19:18:12Z

1

It can be much shorter:

grapes_founds = grapes_matched.select do |grape_matched|
  grapes_matched.none? { |gm| gm[:grape] != grape_matched[:grape] && grape_matched[:grape].include?(gm[:grape]) }
end

In English: select all grapes for which no other grape has a different name which is included in this grape's name.

It's not entirely clear to me what your data structure is and how your strings are normalized so you might have to massage this into exactly the right form.

edited Dec 15, 2017 at 19:18

answered Dec 15, 2017 at 19:11

Max

22.5k7 gold badges55 silver badges79 bronze badges

Collectives™ on Stack Overflow

Browse array of hashes to remove string included in other string in Ruby

2 Answers 2

5 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

5 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related