1

As you can see in the current code below, I am finding the duplicate based on the attribute recordable_id. What I need to do is find the duplicate based on four matching attributes: user_id, recordable_type, hero_type, recordable_id. How must I modify the code?

heroes = User.heroes

for hero in heroes
  hero_statuses = hero.hero_statuses

  seen = []

  hero_statuses.sort! {|a,b| a.created_at <=> b.created_at } # sort by created_at
  hero_statuses.each do |hero_status|
    if seen.map(&:recordable_id).include? hero_status.recordable_id # check if the id has been seen already
      hero_status.revoke
    else
      seen << hero_status # if not, add it to the seen array
    end
  end
end
1
  • I want to get rid of the duplicates with the most-recent created_at date, leaving the record that was initially created. Commented May 8, 2010 at 1:35

2 Answers 2

1

Try this:

HeroStatus.all(:group =>  "user_id, recordable_type, hero_type, recordable_id",
               :having => "count(*) > 1").each do |status|
  status.revoke 
end

Edit 2 To revoke the all the latest duplicate entries do the following:

HeroStatus.all(:joins => "(
     SELECT   user_id, recordable_type, hero_type, 
              recordable_id, MIN(created_at) AS created_at
     FROM     hero_statuses
     GROUP BY user_id, recordable_type, hero_type, recordable_id
     HAVING   COUNT(*) > 1
   ) AS A ON A.user_id         = hero_statuses.user_id         AND 
             A.recordable_type = hero_statuses.recordable_type AND
             A.hero_type       = hero_statuses.hero_type       AND
             A.recordable_id   = hero_statuses.recordable_id   AND
             A.created_at      < hero_statuses.created_
").each do |status|
  status.revoke 
end
Sign up to request clarification or add additional context in comments.

4 Comments

Nice. This is getting me closer. Here's one more wrinkle that I forgot to mention. I want to get rid of the duplicates with the most-recent created_at date, leaving the record that was initially created.
Updated the answer, take a look.
thx kan! this works great. i was looking for ruby-based answer, so going to check off marc's. but appreciate your assistance!
I didn't give you a Ruby based solution as it is very inefficient for your use case. This is something that DB does best. You can use the Ruby approach for small data set(< 1000 statuses), otherwise DB approach is better.
0

Using straight Ruby (not the SQL server):

heroes = User.heroes

for hero in heroes
  hero_statuses = hero.hero_statuses

  seen = {}

  hero_statuses.sort_by!(&:created_at)
  hero_statuses.each do |status|
    key = [status.user_id, status.recordable_type, status.hero_type, status.recordable_id]
    if seen.has_key?(key)
      status.revoke
    else
      seen[key] = status # if not, add it to the seen array
    end
  end

  remaining = seen.values
end

For lookups, always use Hash (or Set, but here I thought it would be nice to keep the statuses that have been kept)

Note: I used sort_by!, but that's new to 1.9.2, so use sort_by (or require "backports")

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.