How to delete duplicate data based on two columns

Question

I want to make a query that deletes duplicate data leaving only one duplicate data when two columns overlap. Maybe because of a lot of data, but the following query doesn't work for a long time


DELETE t1 FROM table t1 INNER JOIN table t2 
WHERE t1.idx < t2.idx AND t1.Nm = t2.Nm AND t1.product = t2.product;

Can this query do what I want? If not, what is the other way?

You should only join rows which are the same DELETE t1 FROM table t1 INNER JOIN table t2 ON (t1.Nm = t2.Nm AND t1.product = t2.product) WHERE t1.idx < t2.idx; — Thallius
– Thallius, Commented Apr 14, 2021 at 12:46
I tried the way you told me, but it's running for more than 5 minutes. The line is about 30000 lines, but does it take so long? — kwsong0314
– kwsong0314, Commented Apr 14, 2021 at 12:56

forpas · Accepted Answer · 2021-04-14 13:36:10Z

2

Create an Index on the 3 columns involved in the ON clause:

CREATE INDEX idx_name
ON tablename (Nm, product, idx);

and execute the query like this:

DELETE t1 FROM tablename t1 INNER JOIN tablename t2 
WHERE t1.Nm = t2.Nm AND t1.product = t2.product AND t1.idx < t2.idx;

As you can see in this simplified demo, the query will be executed using the index.

answered Apr 14, 2021 at 13:36

forpas

165k10 gold badges51 silver badges85 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

How to delete duplicate data based on two columns

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related