13

I have a hash like so:

[
  {
    :lname => "Brown",
    :email => "[email protected]",
    :fname => "James"
  },
  {
    :lname => nil,
    :email => "[email protected]",
    :fname => nil
  },
  {
    :lname => "Smith",
    :email => "[email protected]",
    :fname => "Brad"
  },
  {
    :lname => nil,
    :email => "[email protected]",
    :fname => nil
  },
  {
    :lname => "Smith",
    :email => "[email protected]",
    :fname => "Brad"
  },
  {
    :lname => nil,
    :email => "[email protected]",
    :fname => nil
  }
]

What I would like to learn how to do is how to remove a record if it is duplicate. Meaning, see how there are several "[email protected]" how can I remove the duplicate records, meaning remove all the others that have an email of "[email protected]".... Making email the key not the other fields?

4
  • 2
    Is the a pure Ruby hash or a hash that represents data actually in the database (say, via ActiveRecord)? Commented Mar 6, 2011 at 2:56
  • 1
    why not put validates_uniqueness_of the email field? that way even if you get duplicate stuff in your params, it won't be saved. also put the necessary error catching when saving fails. Commented Mar 6, 2011 at 2:56
  • It's being created based on a CSV list, where users can input emails to invite friends Commented Mar 6, 2011 at 2:57
  • @Corroded, can't do that bec I need to take the input and then parse and display the output to the user. The above is after the input has been parsed. I just need to take it to the next level by removing duplicates. Commented Mar 6, 2011 at 2:58

4 Answers 4

26

In Ruby 1.9.2, Array#uniq will accept a block paramater which it will use when comparing your objects:

arrays.uniq { |h| h[:email] }
Sign up to request clarification or add additional context in comments.

1 Comment

@AnApprentice You can use the backports gem and require 'backports/1.9.2/array/uniq'.
19

I know this is an old thread, but Rails has a method on 'Enumerable' called 'index_by' which can be handy in this case:

list = [
  {
    :lname => "Brown",
    :email => "[email protected]",
    :fname => "James"
  },
  {
    :lname => nil,
    :email => "[email protected]",
    :fname => nil
  },
  {
    :lname => "Smith",
    :email => "[email protected]",
    :fname => "Brad"
  },
  {
    :lname => nil,
    :email => "[email protected]",
    :fname => nil
  },
  {
    :lname => "Smith",
    :email => "[email protected]",
    :fname => "Brad"
  },
  {
    :lname => nil,
    :email => "[email protected]",
    :fname => nil
  }
]

Now you can get the unique rows as follows:

list.index_by {|r| r[:email]}.values

To merge the rows with the same email id.

list.group_by{|r| r[:email]}.map do |k, v|
  v.inject({}) { |r, h| r.merge(h){ |key, o, n| o || n } }
end

Custom but efficient method:

list.inject({}) do |r, h| 
  (r[h[:email]] ||= {}).merge!(h){ |key, old, new| old || new }
  r
end.values

Comments

6

If you're putting this directly into the database, just use validates_uniqueness_of :email in your model. See the documentation for this.

If you need to remove them from the actual hash before being used then do:

emails = []  # This is a temporary array, not your results. The results are still in my_array
my_array.delete_if do |item|
  if emails.include? item[:email]
    true
  else
    emails << item[:email]
    false
  end
end

UPDATE:

This will merge the contents of duplicate entries

merged_list = {}
my_array.each do |item|
  if merged_list.has_key? item[:email]
    merged_list[item.email].merge! item
  else
    merged_list[item.email] = item
  end
end
my_array = merged_list.collect { |k, v| v }

5 Comments

thanks but how would this work. I don't want to lose all the other information. I want to take the hash above and remove the duplicates while retaining fname and lname.
So you actually want to merge the entries with the same email address? That's different than removing duplicates, which is what you asked for.
not merge just remove any duplicate's based on a key of email. It can be unintelligent and just take the firs [email protected] and then remove the rest if any duplicates based solely on the email exists.
ok my apologies. Just tried the first code snippet. It errors with "undefined method `<<' for {}:Hash"
Whoops, emails should be an array, not a hash. My mistake
1

Ok, this (delete duplicates) is what you asked for:

a.sort_by { |e| e[:email] }.inject([]) { |m,e| m.last.nil? ? [e] : m.last[:email] == e[:email] ? m : m << e }

But I think this (merge values) is what you want:

a.sort_by { |e| e[:email] }.inject([]) { |m,e| m.last.nil? ? [e] : m.last[:email] == e[:email] ? (m.last.merge!(e) { |k,o,n| o || n }; m) : m << e }

Perhaps I'm stretching the one-liner idea a bit unreasonably, so with different formatting and a test case:

Aiko:so ross$ cat mergedups
require 'pp'

a = [{:fname=>"James", :lname=>"Brown", :email=>"[email protected]"},
     {:fname=>nil,     :lname=>nil,     :email=>"[email protected]"},
     {:fname=>"Brad",  :lname=>"Smith", :email=>"[email protected]"},
     {:fname=>nil,     :lname=>nil,     :email=>"[email protected]"},
     {:fname=>"Brad",  :lname=>"Smith", :email=>"[email protected]"},
     {:fname=>"Brad",  :lname=>"Smith", :email=>"[email protected]"}]

pp(
  a.sort_by { |e| e[:email] }.inject([]) do |m,e|
    m.last.nil? ? [e] :
      m.last[:email] == e[:email] ? (m.last.merge!(e) { |k,o,n| o || n }; m) :
        m << e
  end
)
Aiko:so ross$ ruby mergedups
[{:email=>"[email protected]", :fname=>"Brad", :lname=>"Smith"},
 {:email=>"[email protected]", :fname=>"James", :lname=>"Brown"}]

5 Comments

That's snazzy only wish I knew how it did what it's doing. For extra points a little commenting
What exactly does .inject([]) do?
@AnApprentice: sure, no problem. #inject is a method in Enumerable which is implemented by Array. In this form, it loops over the array yielding a memo and element object to the block, which returns the memo for the next iteration. So, after the sort_by, I just compare each hash with the last one in the latest memo and merge the fields if the emails match, otherwise I just tack the element onto the end of the memo, which ultimately is what inject will return as the value of the expression.
@Andrew, the [] is the initial value of the memo object that my expression is accumulating. Ultimately it will be the new array with the merged hash elements and will be returned by the last "iteration" of #inject. #inject isn't hugely different from plain old #each, it just returns a value and also accumulates that value for you by yielding it with each element to the block as it iterates.
Yea I was looking at the doc. The craziness of that being all one line made me miss that it was calling on a block and I got really confused as to how [] was a symbol :P. Thanks for the explanation!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.