19

In our system we run hourly imports from an external database. Due to an error in the import scripts, there are now some duplicate records.

A duplicate is deemed where any record has the same :legacy_id and :company.

What code can I run to find and delete these duplicates?

I was playing around with this:

Product.select(:legacy_id,:company).group(:legacy_id,:company).having("count(*) > 1")

It seemed to return some of the duplicates, but I wasn't sure how to delete from there?

Any ideas?

2
  • 1
    stackoverflow.com/questions/14124212/… this helps? Commented Nov 28, 2014 at 13:54
  • That worked great @argentum47 can't believe I missed that when I was browsing Commented Nov 28, 2014 at 15:30

1 Answer 1

19

You can try the following approach:

Product.where.not(
  id: Product.group(:legacy_id, :company).pluck('min(products.id)')
).delete_all

Or pure sql:

delete from products
where id not in ( 
   select min(p.id) from products p group by p.legacy_id, p.company
)
Sign up to request clarification or add additional context in comments.

1 Comment

If you have associated records that you want to delete as well, use destroy_all instead of delete_all

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.