Delete duplicate records based on multiple columns

Question

In our system we run hourly imports from an external database. Due to an error in the import scripts, there are now some duplicate records.

A duplicate is deemed where any record has the same :legacy_id and :company.

What code can I run to find and delete these duplicates?

I was playing around with this:

Product.select(:legacy_id,:company).group(:legacy_id,:company).having("count(*) > 1")

It seemed to return some of the duplicates, but I wasn't sure how to delete from there?

Any ideas?

That worked great @argentum47 can't believe I missed that when I was browsing — bnussey
– bnussey, Commented Nov 28, 2014 at 15:30

souslov · Accepted Answer · 2018-11-20 10:24:39Z

19

You can try the following approach:

Product.where.not(
  id: Product.group(:legacy_id, :company).pluck('min(products.id)')
).delete_all

Or pure sql:

delete from products
where id not in ( 
   select min(p.id) from products p group by p.legacy_id, p.company
)

answered Feb 8, 2016 at 1:44

souslov

44.7k11 gold badges92 silver badges113 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

If you have associated records that you want to delete as well, use destroy_all instead of delete_all