3

I have to following table:

Relations

[id,user_id,status]
1,2,sent_reply
1,2,sent_mention
1,3,sent_mention
1,4,sent_reply
1,4,sent_mention

I am looking for a way to remove duplicates, so that only the following rows will remain:

1,2,sent_reply
1,3,sent_mention
1,4,sent_reply

(Preferably using Rails)

5
  • You want to return only uniq items or you want remove all duplicates? Commented Apr 12, 2011 at 16:33
  • also you have got the same id for all your relations Commented Apr 12, 2011 at 16:34
  • So you want only one single (id, user_id) pair, regardless of the status? How do you decide which 'status' message to keep? Last one recorded? First one? Random? Commented Apr 12, 2011 at 16:44
  • I want to remove duplicates as judged by the first two fields (id, user_id). My example is a bit misleading in that id isn't a primary key (which would be unique) but some other id (think of it as member_id) Commented Apr 14, 2011 at 9:29
  • @marc-b good point, I want to keep the "sent-reply" records Commented Apr 14, 2011 at 9:31

2 Answers 2

3

I know this is way late, but I found a good way to do it using Rails 3. There are probably better ways, though, and I don't know how this will perform with 100,000+ rows of data, but this should get you on the right track.

# Get a hash of all id/user_id pairs and how many records of each pair
counts = ModelName.group([:id, :user_id]).count
# => {[1, 2]=>2, [1, 3]=>1, [1, 4]=>2}

# Keep only those pairs that have more than one record
dupes = counts.select{|attrs, count| count > 1}
# => {[1, 2]=>2, [1, 4]=>2}

# Map objects by the attributes we have
object_groups = dupes.map do |attrs, count|
  ModelName.where(:id => attrs[0], :user_id => attrs[1])
end

# Take each group and #destroy the records you want.
# Or call #delete instead to save time if you don't need ActiveRecord callbacks
# Here I'm just keeping the first one I find.
object_groups.each do |group|
  group.each_with_index do |object, index|
    object.destroy unless index == 0
  end
end
Sign up to request clarification or add additional context in comments.

Comments

-1

It is better to do it through SQL. But if you prefer to use Rails:

(Relation.all - Relation.all.uniq_by{|r| [r.user_id, r.status]}).each{ |d| d.destroy }

or

 ids = Relation.all.uniq_by{|r| [r.user_id, r.status]}.map(&:id)
 Relation.where("id IS NOT IN (?)", ids).destroy_all # or delete_all, which is faster

But I don't like this solution :D

1 Comment

This would be very slow and memory consuming (My relations table is 100,000+ rows. Is there any more SQLish way to do this. In this point, it's not very important to wrap it in rails.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.