How to remove duplicates in a hash in Ruby on Rails?

Question

I have a hash like so:

[
  {
    :lname => "Brown",
    :email => "[email protected]",
    :fname => "James"
  },
  {
    :lname => nil,
    :email => "[email protected]",
    :fname => nil
  },
  {
    :lname => "Smith",
    :email => "[email protected]",
    :fname => "Brad"
  },
  {
    :lname => nil,
    :email => "[email protected]",
    :fname => nil
  },
  {
    :lname => "Smith",
    :email => "[email protected]",
    :fname => "Brad"
  },
  {
    :lname => nil,
    :email => "[email protected]",
    :fname => nil
  }
]

What I would like to learn how to do is how to remove a record if it is duplicate. Meaning, see how there are several "[email protected]" how can I remove the duplicate records, meaning remove all the others that have an email of "[email protected]".... Making email the key not the other fields?

Is the a pure Ruby hash or a hash that represents data actually in the database (say, via ActiveRecord)? — Andrew Marshall
– Andrew Marshall, Commented Mar 6, 2011 at 2:56
why not put validates_uniqueness_of the email field? that way even if you get duplicate stuff in your params, it won't be saved. also put the necessary error catching when saving fails. — corroded
– corroded, Commented Mar 6, 2011 at 2:56
It's being created based on a CSV list, where users can input emails to invite friends — AnApprentice
– AnApprentice, Commented Mar 6, 2011 at 2:57
@Corroded, can't do that bec I need to take the input and then parse and display the output to the user. The above is after the input has been parsed. I just need to take it to the next level by removing duplicates. — AnApprentice
– AnApprentice, Commented Mar 6, 2011 at 2:58

dnch · Accepted Answer · 2011-03-06 03:06:54Z

26

In Ruby 1.9.2, Array#uniq will accept a block paramater which it will use when comparing your objects:

arrays.uniq { |h| h[:email] }

answered Mar 6, 2011 at 3:06

dnch

9,6152 gold badges41 silver badges41 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Marc-André Lafortune Over a year ago

@AnApprentice You can use the backports gem and require 'backports/1.9.2/array/uniq'.

Harish Shetty · Accepted Answer · 2011-03-15 23:56:45Z

I know this is an old thread, but Rails has a method on 'Enumerable' called 'index_by' which can be handy in this case:

list = [
  {
    :lname => "Brown",
    :email => "[email protected]",
    :fname => "James"
  },
  {
    :lname => nil,
    :email => "[email protected]",
    :fname => nil
  },
  {
    :lname => "Smith",
    :email => "[email protected]",
    :fname => "Brad"
  },
  {
    :lname => nil,
    :email => "[email protected]",
    :fname => nil
  },
  {
    :lname => "Smith",
    :email => "[email protected]",
    :fname => "Brad"
  },
  {
    :lname => nil,
    :email => "[email protected]",
    :fname => nil
  }
]

Now you can get the unique rows as follows:

list.index_by {|r| r[:email]}.values

To merge the rows with the same email id.

list.group_by{|r| r[:email]}.map do |k, v|
  v.inject({}) { |r, h| r.merge(h){ |key, o, n| o || n } }
end

Custom but efficient method:

list.inject({}) do |r, h| 
  (r[h[:email]] ||= {}).merge!(h){ |key, old, new| old || new }
  r
end.values

Andrew Marshall · Accepted Answer · 2011-03-06 03:37:02Z

6

If you're putting this directly into the database, just use validates_uniqueness_of :email in your model. See the documentation for this.

If you need to remove them from the actual hash before being used then do:

emails = []  # This is a temporary array, not your results. The results are still in my_array
my_array.delete_if do |item|
  if emails.include? item[:email]
    true
  else
    emails << item[:email]
    false
  end
end

UPDATE:

This will merge the contents of duplicate entries

merged_list = {}
my_array.each do |item|
  if merged_list.has_key? item[:email]
    merged_list[item.email].merge! item
  else
    merged_list[item.email] = item
  end
end
my_array = merged_list.collect { |k, v| v }

edited Mar 6, 2011 at 3:37

answered Mar 6, 2011 at 3:04

Andrew Marshall

97.3k20 gold badges228 silver badges217 bronze badges

5 Comments

AnApprentice Over a year ago

thanks but how would this work. I don't want to lose all the other information. I want to take the hash above and remove the duplicates while retaining fname and lname.

Andrew Marshall Over a year ago

So you actually want to merge the entries with the same email address? That's different than removing duplicates, which is what you asked for.

AnApprentice Over a year ago

not merge just remove any duplicate's based on a key of email. It can be unintelligent and just take the firs [email protected] and then remove the rest if any duplicates based solely on the email exists.

AnApprentice Over a year ago

ok my apologies. Just tried the first code snippet. It errors with "undefined method `<<' for {}:Hash"

Andrew Marshall Over a year ago

Whoops, emails should be an array, not a hash. My mistake

DigitalRoss · Accepted Answer · 2011-03-06 16:51:09Z

1

Ok, this (delete duplicates) is what you asked for:

a.sort_by { |e| e[:email] }.inject([]) { |m,e| m.last.nil? ? [e] : m.last[:email] == e[:email] ? m : m << e }

But I think this (merge values) is what you want:

a.sort_by { |e| e[:email] }.inject([]) { |m,e| m.last.nil? ? [e] : m.last[:email] == e[:email] ? (m.last.merge!(e) { |k,o,n| o || n }; m) : m << e }

Perhaps I'm stretching the one-liner idea a bit unreasonably, so with different formatting and a test case:

Aiko:so ross$ cat mergedups
require 'pp'

a = [{:fname=>"James", :lname=>"Brown", :email=>"[email protected]"},
     {:fname=>nil,     :lname=>nil,     :email=>"[email protected]"},
     {:fname=>"Brad",  :lname=>"Smith", :email=>"[email protected]"},
     {:fname=>nil,     :lname=>nil,     :email=>"[email protected]"},
     {:fname=>"Brad",  :lname=>"Smith", :email=>"[email protected]"},
     {:fname=>"Brad",  :lname=>"Smith", :email=>"[email protected]"}]

pp(
  a.sort_by { |e| e[:email] }.inject([]) do |m,e|
    m.last.nil? ? [e] :
      m.last[:email] == e[:email] ? (m.last.merge!(e) { |k,o,n| o || n }; m) :
        m << e
  end
)
Aiko:so ross$ ruby mergedups
[{:email=>"[email protected]", :fname=>"Brad", :lname=>"Smith"},
 {:email=>"[email protected]", :fname=>"James", :lname=>"Brown"}]

edited Mar 6, 2011 at 16:51

answered Mar 6, 2011 at 3:26

DigitalRoss

147k25 gold badges255 silver badges336 bronze badges

5 Comments

AnApprentice Over a year ago

That's snazzy only wish I knew how it did what it's doing. For extra points a little commenting

Andrew Marshall Over a year ago

What exactly does .inject([]) do?

DigitalRoss Over a year ago

@AnApprentice: sure, no problem. #inject is a method in Enumerable which is implemented by Array. In this form, it loops over the array yielding a memo and element object to the block, which returns the memo for the next iteration. So, after the sort_by, I just compare each hash with the last one in the latest memo and merge the fields if the emails match, otherwise I just tack the element onto the end of the memo, which ultimately is what inject will return as the value of the expression.

DigitalRoss Over a year ago

@Andrew, the [] is the initial value of the memo object that my expression is accumulating. Ultimately it will be the new array with the merged hash elements and will be returned by the last "iteration" of #inject. #inject isn't hugely different from plain old #each, it just returns a value and also accumulates that value for you by yielding it with each element to the block as it iterates.

Andrew Marshall Over a year ago

Yea I was looking at the doc. The craziness of that being all one line made me miss that it was calling on a block and I got really confused as to how [] was a symbol :P. Thanks for the explanation!

Collectives™ on Stack Overflow

How to remove duplicates in a hash in Ruby on Rails?

4 Answers 4

1 Comment

Comments

5 Comments

5 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

1 Comment

Comments

5 Comments

5 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related