How to remove duplicate row in array of hashes on Ruby on rails

Question

I am using Ruby 2.6 in my application.

I want to remove the duplicate element in array of hashes. Here is my input

array_of_hashes = [
{"Date"=> "2019-05-6", "ID" => 100, "Rate" => 10, "Count" => 1},
{"Date"=> "2019-05-6", "ID" => 100, "Rate" => nil, "Count" => 0},
{"Date"=> "2019-05-6", "ID" => 101, "Rate" => 25, "Count" => 3},
{"Date"=> "2019-05-6", "ID" => 102, "Rate" => nil, "Count" => 0},
{"Date"=> "2019-05-6", "ID" => 102, "Rate" => 35, "Count" => 0},
{"Date"=> "2019-05-6", "ID" => 103, "Rate" => 20, "Count" => 6}
]

I am creating key, value pair from the hash for the need of my application.

result = array_of_hashes.map { |row| [[row['ID'], row['Date'], row] }.to_h

If there are two records with same "ID" and "Date" values in a hash, I want to rows the row where "Rate" != 0 where input records order might shuffle. Here is my Actual and Expected result.

Actual Result:

 {[100, "2019-05-6"]=>{"Date"=>"2019-05-6", "ID"=>100, "Rate"=>nil, "Count"=>0},
 [101, "2019-05-6"]=>{"Date"=>"2019-05-6", "ID"=>101, "Rate"=>25, "Count"=>3},
 [102, "2019-05-6"]=>{"Date"=>"2019-05-6", "ID"=>102, "Rate"=>35, "Count"=>0},
 [103, "2019-05-6"]=>{"Date"=>"2019-05-6", "ID"=>103, "Rate"=>20, "Count"=>6}}

Expected result:

 {[100, "2019-05-6"]=>{"Date"=>"2019-05-6", "ID"=>100, "Rate"=>10, "Count"=>1}, 
 [101, "2019-05-6"]=>{"Date"=>"2019-05-6", "ID"=>101, "Rate"=>25, "Count"=>3},
 [102, "2019-05-6"]=>{"Date"=>"2019-05-6", "ID"=>102, "Rate"=>35, "Count"=>0},
 [103, "2019-05-6"]=>{"Date"=>"2019-05-6", "ID"=>103, "Rate"=>20, "Count"=>6}}

How can I get the above expected result?

1. Can the "expected result" contain a value (hash) for which Rate = nil? 2. Can array_of_hashes contain two elements having the same values for "ID" and "Date" and neither has a nil value for "Rate"? If "yes", which should be selected? — Cary Swoveland
– Cary Swoveland, Commented May 8, 2019 at 16:09

engineersmnky · Accepted Answer · 2019-05-08 16:11:40Z

3

Here is another group by option

array_of_hashes.group_by {|h| h.values_at("ID","Date")}.transform_values do |v|   
  v.find {|r| r["Rate"]}
end

#=> {[100, "2019-05-6"]=>{"Date"=>"2019-05-6", "ID"=>100, "Rate"=>10, "Count"=>1}, 
#    [101, "2019-05-6"]=>{"Date"=>"2019-05-6", "ID"=>101, "Rate"=>25, "Count"=>3}, 
#    [102, "2019-05-6"]=>{"Date"=>"2019-05-6", "ID"=>102, "Rate"=>35, "Count"=>0}, 
#    [103, "2019-05-6"]=>{"Date"=>"2019-05-6", "ID"=>103, "Rate"=>20, "Count"=>6}}

group by id and date then transform the Hash values to the first Hash where "Rate" is not nil.

If multiple values are acceptable then find_all or select could be substituted for find.

If you want the original structure maintained just add values to the end.

answered May 8, 2019 at 16:11

engineersmnky

30.5k2 gold badges42 silver badges64 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Cary Swoveland Over a year ago

...or !r["Rate"].nil? to read better (?) and not worry about the value of "Rate" being false (however unlikely that may be).

engineersmnky Over a year ago

@CarySwoveland Really you think that reads better? I would prefer v.lazy.reject {|r| r["Rate"].nil? }.first over that and to the same effect as find first non nil rate hash returned without regard for other hashes in the group.

Cary Swoveland Over a year ago

"Reads better" because when I see {|r| r["Rate"]} the question, "what about false?" immediately comes to mind and requires processing. My "(?)" reflects the need for !. What I'd really like is {|r| r["Rate"].non_nil? }.

engineersmnky Over a year ago

@CarySwoveland you could go with something super ugly like r unless r['Rate'].nil?

Cary Swoveland · Accepted Answer · 2019-05-08 17:07:09Z

2

We can construct the desired hash by making a single pass through array_of_hashes.

array_of_hashes.each_with_object({}) do |g,h|
  k = [g['ID'], g['Date']]
  h.update(k=>g) unless h.key?(k) && h[k]['Rate'] != nil
end
  #=> {[100, "2019-05-6"]=>{"Date"=>"2019-05-6", "ID"=>100, "Rate"=>10, "Count"=>1},
  #    [101, "2019-05-6"]=>{"Date"=>"2019-05-6", "ID"=>101, "Rate"=>25, "Count"=>3},
  #    [102, "2019-05-6"]=>{"Date"=>"2019-05-6", "ID"=>102, "Rate"=>35, "Count"=>0},
  #    [103, "2019-05-6"]=>{"Date"=>"2019-05-6", "ID"=>103, "Rate"=>20, "Count"=>6}}

This assumes that if two elements of array_of_hashes match on the values of 'ID' and 'Date', and neither has a value of nil for 'Rate', the first of the two hashes is retained. If the latter of the two should be retained change the second line of the method to:

h.update(k=>g) unless h.key?(k) && g['Rate'].nil?

answered May 8, 2019 at 17:07

Cary Swoveland

111k6 gold badges69 silver badges105 bronze badges

2 Comments

engineersmnky Over a year ago

for the first solution you could go with if h.dig(k,'Rate').nil? it is fail fast so the result would be the same.

Cary Swoveland Over a year ago

@engineersmnky, I've not seen that before. Clever!

Machisuji · Accepted Answer · 2019-05-08 11:36:04Z

1

Use group_by and filter nil rates from the values.

array_of_hashes
  .group_by { |h| [h["ID"], h["Date"]] }
  .map { |key, values| [key, values.reject { |row| row["Rate"].nil? }.last] }
  .to_h

edited May 8, 2019 at 11:36

answered May 8, 2019 at 11:22

Machisuji

7488 silver badges16 bronze badges

4 Comments

Machisuji Over a year ago

You don't want the rows where rate is nil I assume?

Galet Over a year ago

Yes. I want the rows where rate is not nil and input records are in shuffled every-time

Galet Over a year ago

Why .last is needed here?

Machisuji Over a year ago

It is if we want the result you specified where only one row is returned. If you want potentially several rows per ID-Date tuple you can just simply drop the .last.

Collectives™ on Stack Overflow

How to remove duplicate row in array of hashes on Ruby on rails

3 Answers 3

4 Comments

2 Comments

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

4 Comments

2 Comments

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related