How to optimize a delete query with a subselect?

Question

This query needs to delete over 17 million rows, from a table containing 20 million.

DELETE
FROM statements
WHERE agreement_id IN
    (SELECT id
     FROM agreements
     WHERE created < DATE_SUB(CURDATE(), INTERVAL 6 MONTH));


DELETE
FROM agreements
WHERE created < DATE_SUB(CURDATE(), INTERVAL 6 MONTH)

It takes hours to run, am I missing something that could speed things up a bit?

The subselect by itself takes a few seconds, I don't understand why the delete takes so long.

Can we have the structure of statements table, maybe you should make agreement_id and index. — Kerkouch
– Kerkouch, Commented Jan 10, 2019 at 19:49
Every query-optimization question should include the output of SHOW CREATE TABLE <tablename> for each table referenced in the query. Help us help you — don't make us guess at which data types and indexes you currently have. — Bill Karwin
– Bill Karwin, Commented Jan 10, 2019 at 21:36

Derviş Kayımbaşıoğlu · Accepted Answer · 2019-01-10 19:57:51Z

1

If you have this much delete to be undertaken. I suggest you to:

create new temporary table with the data which will stay.
Truncate your main table
Move data from temporary table to your main table

or

create new temporary table with the data which will stay.
Drop your main table
Rename your Temp table as main table (dont forget to create constraints)

Also for your query,

never use IN clause for BIG data. Instead use exists which is more performant.

Basic script:

CREATE TABLE tmp_statements as
  SELECT * FROM statements s where exists 
  (
     select 1 FROM agreements a 
     WHERE 
       created < DATE_SUB(CURDATE(), INTERVAL 6  MONTH AND
       s.agreement_id = a.agreement_id
  ));

 DROP TABLE statements;

 RENAME TABLE tmp_statements TO statements ;

 --DONT FORGET TO RECREATE  CREATE YOUR INDEXES, CONSTRAINTS;

edited Jan 10, 2019 at 19:57

answered Jan 10, 2019 at 19:49

Derviş Kayımbaşıoğlu

30.9k4 gold badges55 silver badges79 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Stephane Gosselin Over a year ago

As I wrote in comment above who linked to same approach, upon reading this I was sceptical. I needed to build a minimal dataset for local dev, after trying alternate methods, I went with this approach. By far the quickest (and a tad dirty ) way to do it. Thanks!

sticky bit · Accepted Answer · 2019-01-10 22:32:48Z

1

Try to rewrite the first statement to use EXISTS.

DELETE FROM statements
            WHERE EXISTS (SELECT *
                                 FROM agreements
                                 WHERE agreements.id = statements.aggreement_id
                                       AND agreements.created < date_sub(curdate(), interval 6 month));

And put an index on agreements (id, created) (if not already there).

CREATE INDEX agreements_id_created
             ON agreements
                (id,
                 created);

For the second one create an index on agreements (created) (if not already there).

CREATE INDEX agreements_created
             ON agreements
                (created);

answered Jan 10, 2019 at 22:32

sticky bit

37.7k12 gold badges34 silver badges46 bronze badges

Comments

Rick James · Accepted Answer · 2019-01-10 23:18:07Z

1

Use a "multi-table delete" instead of the usually inefficient IN ( SELECT ... ).

Several techniques for large deletes are discussed here.

To delete 85% of the table, it is really best to build a new table with the 15% you are keeping, then swap the table into place. (More on that in the link above.)

answered Jan 10, 2019 at 23:18

Rick James

144k15 gold badges144 silver badges255 bronze badges

2 Comments

Stephane Gosselin Over a year ago

Upon reading this I was sceptical. I needed to build a minimal dataset for local dev, after trying alternate methods, I went with this approach. By far the quickest and (a tad) dirty way to do it.

Rick James Over a year ago

@stefgosselin - In real life, kludges are sometimes OK. (I assume you were talking about the "copy rows to keep" method?)

Collectives™ on Stack Overflow

How to optimize a delete query with a subselect?

3 Answers 3

1 Comment

Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

1 Comment

Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related