1

I want to make a query that deletes duplicate data leaving only one duplicate data when two columns overlap. Maybe because of a lot of data, but the following query doesn't work for a long time


DELETE t1 FROM table t1 INNER JOIN table t2 
WHERE t1.idx < t2.idx AND t1.Nm = t2.Nm AND t1.product = t2.product;

Can this query do what I want? If not, what is the other way?

2
  • 1
    You should only join rows which are the same DELETE t1 FROM table t1 INNER JOIN table t2 ON (t1.Nm = t2.Nm AND t1.product = t2.product) WHERE t1.idx < t2.idx; Commented Apr 14, 2021 at 12:46
  • I tried the way you told me, but it's running for more than 5 minutes. The line is about 30000 lines, but does it take so long? Commented Apr 14, 2021 at 12:56

1 Answer 1

2

Create an Index on the 3 columns involved in the ON clause:

CREATE INDEX idx_name
ON tablename (Nm, product, idx);

and execute the query like this:

DELETE t1 FROM tablename t1 INNER JOIN tablename t2 
WHERE t1.Nm = t2.Nm AND t1.product = t2.product AND t1.idx < t2.idx;

As you can see in this simplified demo, the query will be executed using the index.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.