Find and Delete Duplicate rows in MySQL

Question

I'm having trouble finding duplicates in a database table with the following setup:

==========================================================================
| stock_id  | product_id  | store_id  | stock_qty  | updated_at          |
==========================================================================
| 9990      | 51          | 1         | 13         | 2014-10-25 16:30:01 |
| 9991      | 90          | 2         | 5          | 2014-10-25 16:30:01 |
| 9992      | 161         | 1         | 3          | 2014-10-25 16:30:01 |
| 9993      | 254         | 1         | 18         | 2014-10-25 16:30:01 |
| 9994      | 284         | 2         | 12         | 2014-10-25 16:30:01 |
| 9995      | 51          | 1         | 11         | 2014-10-25 17:30:02 |
| 9996      | 90          | 2         | 5          | 2014-10-25 17:30:02 |
| 9997      | 161         | 1         | 3          | 2014-10-25 17:30:02 |
| 9998      | 254         | 1         | 16         | 2014-10-25 17:30:02 |
| 9999      | 284         | 2         | 12         | 2014-10-25 17:30:02 |
==========================================================================

Stock updates are imported into this table every hour, I'm trying to find duplicate stock entries (any rows which have a matching product id and store id) so I can delete the oldest. The query below is my attempt, by comparing product ids and store ids on a join like this I can find one set of duplicates:

SELECT s.`stock_id`, s.`product_id`, s.`store_id`, s.`stock_qty`, s.`updated_at`
FROM `stock` s
INNER JOIN `stock` j ON s.`product_id`=j.`product_id` AND s.`store_id`=j.`store_id`
GROUP BY `stock_id`
HAVING COUNT(*) > 1
ORDER BY s.updated_at DESC, s.product_id ASC, s.store_id ASC, s.stock_id ASC;

While this query will work, it doesn't find ALL duplicates, only 1 set, which means if an import goes awry and isn't noticed until the morning, there's a possibility that we'll be left with tons of duplicate stock entries. My MySQL skills are sadly lacking and I'm at a complete loss about how to find and delete all duplicates in a fast, reliable manner.

Any help or ideas are welcome. Thanks

hi @jbutler483, great question, the system we're working in has an ORM that requires the existence of a single primary key. We really on the ORM in other ways for scaffolding the CRUD management via the same system. — Herbage Onion
– Herbage Onion, Commented Nov 27, 2014 at 16:04
Plus the database will keep the oldest data, which is not the required functionality :) — Artjoman
– Artjoman, Commented Nov 27, 2014 at 16:05
@HerbageOnion I can't see why it matters that there's a single PK - as long as you can have a unique key constraint on (product_id,store_id). Then INSERT... ON DUPLICATE KEY UPDATE will solve the problem — Strawberry
– Strawberry, Commented Nov 27, 2014 at 17:06

Kruti Patel · Accepted Answer · 2016-08-03 12:21:26Z

2

You can use this query:

DELETE st FROM stock st, stock st2 
WHERE st.stock_id < st2.stock_id AND st.product_id = st2.product_id AND 
st.store_id = st2.store_id;

This query will delete older record having same product_id and store_id and will keep latest record.

answered Aug 3, 2016 at 12:21

Kruti Patel

1,4722 gold badges24 silver badges37 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Hartmut Holzgraefe · Accepted Answer · 2014-11-27 16:34:18Z

1

A self join on store_id, product_id and 'is older' in combination with DISTINCT should give you all rows where also a newer version exists:

> SHOW CREATE TABLE stock;
CREATE TABLE `stock` (
  `stock_id` int(11) NOT NULL,
  `product_id` int(11) DEFAULT NULL,
  `store_id` int(11) DEFAULT NULL,
  `stock_qty` int(11) DEFAULT NULL,
  `updated_at` datetime DEFAULT NULL,
  PRIMARY KEY (`stock_id`)

> select * from stock;
+----------+------------+----------+-----------+---------------------+
| stock_id | product_id | store_id | stock_qty | updated_at          |
+----------+------------+----------+-----------+---------------------+
|        1 |          1 |        1 |         1 | 2001-01-01 12:00:00 |
|        2 |          2 |        2 |         1 | 2001-01-01 12:00:00 |
|        3 |          2 |        2 |         1 | 2002-01-01 12:00:00 |
+----------+------------+----------+-----------+---------------------+


> SELECT DISTINCT s1.stock_id, s1.store_id, s1.product_id, s1.updated_at   
   FROM stock s1   JOIN stock s2     
     ON s1.store_id = s2.store_id    
    AND s1.product_id = s2.product_id    
    AND s1.updated_at < s2.updated_at;
+----------+----------+------------+---------------------+
| stock_id | store_id | product_id | updated_at          |
+----------+----------+------------+---------------------+
|        2 |        2 |          2 | 2001-01-01 12:00:00 |
+----------+----------+------------+---------------------+

> DELETE stock FROM stock 
               JOIN stock s2  ON stock.store_id = s2.store_id  
                             AND stock.product_id = s2.product_id 
                             AND stock.updated_at < s2.updated_at;
Query OK, 1 row affected (0.02 sec)

> select * from stock;
+----------+------------+----------+-----------+---------------------+
| stock_id | product_id | store_id | stock_qty | updated_at          |
+----------+------------+----------+-----------+---------------------+
|        1 |          1 |        1 |         1 | 2001-01-01 12:00:00 |
|        3 |          2 |        2 |         1 | 2002-01-01 12:00:00 |
+----------+------------+----------+-----------+---------------------+

edited Nov 27, 2014 at 16:34

answered Nov 27, 2014 at 16:12

Hartmut Holzgraefe

2,76515 silver badges15 bronze badges

9 Comments

Hartmut Holzgraefe Over a year ago

Which can be combined with DELETE by only selecting s1.stock_id and combining that query with DELETE as ` DELETE FROM stock WHERE stock_id IN (SELECT DISTINCT s1.stock_id FROM ...) `

Herbage Onion Over a year ago

Thanks for your help Hartmut, did you mean to have s1.store_id = s2.store_id instead of s1.stock_id = s2.stock_id? I've tried that and I don't get any results...

Herbage Onion Over a year ago

Just to be clear, there are no results but also no errors. Thanks for your help so far.

Hartmut Holzgraefe Over a year ago

please try the query in its current (edited/fixed) form

Herbage Onion Over a year ago

I tried it again, I think the sq is another typo, so I used s1, still no results though

|

Artjoman · Accepted Answer · 2014-11-27 17:03:04Z

1

Or you can use a stored Procedure:

DELIMITER //
DROP PROCEDURE IF EXISTS removeDuplicates;


 CREATE PROCEDURE removeDuplicates(
   stockID INT
 )
 BEGIN


    DECLARE stockToKeep INT;
    DECLARE storeID INT;
    DECLARE productID INT;

 -- gets the store and product value
 SELECT DISTINCT store_id, product_id
 FROM stock
  WHERE stock_id = stockID  
  LIMIT 1
 INTO
  storeID, productID;

 SELECT stock_id
 FROM stock
  WHERE product_id = productID AND store_id = storeID  
  ORDER BY updated_at DESC
  LIMIT 1
 INTO
  stockToKeep;

    DELETE FROM stock 
    WHERE product_id = productID AND store_id = storeID 
    AND stock_id != stockToKeep;
END //
DELIMITER ;

And afterwards call it for every pair of the product id and store id via a cursor procedure: DELIMITER // CREATE PROCEDURE updateTable() BEGIN DECLARE done BOOLEAN DEFAULT FALSE; DECLARE stockID INT UNSIGNED; DECLARE cur CURSOR FOR SELECT DISTINCT stock_id FROM stock; DECLARE CONTINUE HANDLER FOR NOT FOUND SET done := TRUE;

  OPEN cur;

  testLoop: LOOP
    FETCH cur INTO stockID;
    IF done THEN
      LEAVE testLoop;
    END IF;
    CALL removeDuplicates(stockID);
  END LOOP testLoop;

  CLOSE cur;
END//
DELIMITER ;

And then just call the second procedure

CALL updateTable();

edited Nov 27, 2014 at 17:03

answered Nov 27, 2014 at 16:33

Artjoman

2861 silver badge9 bronze badges

1 Comment

Herbage Onion Over a year ago

Hi thanks for the detailed answer, I have to be honest, this scares the crap out of me, I don't doubt you know your stuff and that's probably one hell of a solution but I need to be able to maintain the solution I implement and I don't think I could with that :( Thanks very much for your time, I'll be deconstructing your solution to learn more about stored procedures!! :)

Collectives™ on Stack Overflow

Find and Delete Duplicate rows in MySQL

3 Answers 3

Comments

9 Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

9 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related