3

I'm quite sloppy with databases, can't get this working with joins, and I'm not even sure that would be faster...

DELETE FROM atable 
WHERE  btable_id IN (SELECT id 
                     FROM   btable 
                     WHERE  param > 2) 
       AND ctable_id IN (SELECT id 
                         FROM   ctable 
                         WHERE  ( someblob LIKE '%_ID1_%' 
                                  OR someblob LIKE '%_ID2_%' )) 

Atable contains ~19M rows, this would delete ~3M of that. At the moment, I can only run the query with LIMIT 100000, and I don't want to sit here with phpmyadmin all day, because each deletion (of 100.000 rows) runs for about 1.5 mins.

Any ways to speed this up / automate it?

MySQL 5.5

(do you think it's already bad DB design if any table contains 20M rows?)

1
  • The number of rows is not a measure for good or bad table design. Are your tables normalized? THAT would be a measure for good table design... Commented Jan 8, 2014 at 13:03

4 Answers 4

2

Use EXISTS or JOIN instead of IN to improve perfromance

Using EXISTS:

DELETE FROM Atable A 
WHERE EXISTS (SELECT 1 FROM Btable B WHERE A.Btable_id = B.id AND B.param > 2) AND 
      EXISTS (SELECT 1 FROM Ctable C WHERE A.Ctable_id = C.id AND (C.someblob LIKE '%_ID1_%' OR C.someblob LIKE '%_ID2_%'))

Using JOIN:

DELETE A 
FROM Atable A 
INNER JOIN Btable B ON A.Btable_id = B.id AND B.param > 2
INNER JOIN Ctable C WHERE A.Ctable_id = C.id AND (C.someblob LIKE '%_ID1_%' OR C.someblob LIKE '%_ID2_%')
Sign up to request clarification or add additional context in comments.

1 Comment

The one with EXISTS doesn't seem to improve things, well, maybe a couple seconds on average. My problem with the second one is, I can't specify a LIMIT, and this way it just times out. Thanks anyways!
1

first you should try with exist instead of in. it's faster in many many case.

Then you could try to do inner join instead of in and exists.

Example :

delete a 
from a 
inner join b on b.id = a.tablebid

And finally if it could be possible (i don't know if you have id3, ids) to change the or by something else. Sometimes strange and complicated change helps the optimizer. case when, subquery...

Comments

1

I don't see where a simple index would help much. I'd do:

delete from atable where id in (
    select
        id
    from
        atable a
        join btable b on a.btable_id = b.id
        join ctable c on a.ctable_id = c.id
    where
        b.param > 2
        and (
            c.someblob LIKE '%_ID1_%' 
            OR c.someblob LIKE '%_ID2_%'
        )
)

Correction: I'm assuming you've got indexes on btable and ctable's id's (probably, if they're primary keys...) and on b.param (if it's numeric).

Comments

1

Beside optimizing the query you could also take a look at a good use of indexes, since they might prevent a full table scan.

For BTable for example create an index on id and param.

To explain why this helps: If the database has to look up the id and param values in the table in a unsorted manner, the database has to read ALL rows. If the database reads the index, SORTED, it can look up the id and param with reduced costs.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.