36

This is probably very easy, but it's Monday morning. I have two tables:

Table1:

Field        | Type             | Null | Key | Default | Extra
id           | int(32) unsigned | NO   | PRI | NULL    | auto_increment
group        | int(32)          | NO   |     | 0       |                

Table2:

Field     | Type             | Null | Key | Default | Extra
group     | int(32)          | NO   |     | 0       | 

Ignoring other fields...I would like a single SQL DELETE statement that will delete all rows in Table1 for which there exists a Table2.group equal to Table1.group. Thus, if a row of Table1 has group=69, that row should be deleted if and only if there exists a row in Table2 with group=69.

Thank you for any help.

6 Answers 6

49

I think this is what you want:

DELETE FROM `table1`
WHERE `group` in (SELECT DISTINCT `group` FROM `table2`)
Sign up to request clarification or add additional context in comments.

5 Comments

That's simple and very effective. Works in SQL server (but without the single quotation marks).
p.s. remember to not do it the wrong way around ;) a real killer lol
This is indeed a solution but certainly not the only one and most definitively not the most performer one. The 'IN' operator must be always used with caution on very big tables. A better solution is the one provided by @BT26 which uses inner joins.
As @kuklei has said, this is very resource intensive and although it works, it is essentially unusable for anything but small tables. We have a table with 1.3m rows and even limiting this to removing 500, it takes more than 5min just to prepare the query. The answer by BT26 below is the correct answer.
As the previous comments suggested, this only works for limited rows. Consider that it is turning the entirety of that into a big IN list. BT26's solution is good, esp. for the case specifically called out in the OP. However, I'd recommend @AmoBrinkman's solution in general (the EXISTS example), as it will perform better if table2 has many rows for the field joined on.
23

I think this way is faster:

DELETE FROM t1 USING table1 t1 INNER JOIN table2 t2 ON ( t1.group = t2.group );

3 Comments

In SQL Server this seems not to work: SSMS throws this at me for the above query: Incorrect syntax near 'USING'.
In the case of SQL Server the syntax is: DELETE FROM t1 FROM t1 INNER JOIN T2 ON t1.ID = t2.ID This deletes all rows from t1 that exists on table t2 based on the id but more conditions can be added to the inner join clause as normally with the AND operator. Notice the second From?! It is there not by mistake and that is what makes the USING work on SQL Server.
If you are deleting from really large InnoDB tables, you might consider increasing your innodb_buffer_pool_size. This first failed after 30 minutes of execution for me, so I increased to 512MB and succeeded in removing some 13 million extra records. The tables I was joining were 44 million and 23 million records, respectively to their order in the query.
8

The nice solution is just writing the SQL as you say it yourself already:

DELETE FROM Table1
WHERE
  EXISTS(SELECT 1 FROM Table2 WHERE Table2.Group = Table1.Group)

Regards, Arno Brinkman

2 Comments

I like about this solution that it contains a WHERE clause. This means that it will still work if you want to delete only such lines which match across several columns in the two tables.
@ArnoBrinkman would adding LIMIT 1 to: EXISTS(SELECT 1 FROM Table2 WHERE Table2.Group = Table1.Group LIMIT 1) help?
7

Something like this

delete from table1 where group in (select group from table2)

2 Comments

I'd select distinct in the subquery
Isn't using distinct just more work to remove duplicates from the subquery? I'd leave it off.
1

Off the top of my head:

delete from Table1 where id in (select id from table1 inner join table2 on Table1.group = Table2.group)

I did this a little differently than other posters -- I think if there is a large number of rows on Table2 this might be better. Can someone please set me straight on that?

1 Comment

It still goes back to an IN list, which is problematic. Consider the @AmoBrinkman example or the BT26; these do not require the engine to make a large array. e.g., you could change "where id in" to "where exists" and it should probably work identically, but much faster and "at all" for large volumes.
1

you can delete either table rows by using its alias in a simple join query like

delete a from table1 a,table2 b where a.uid=b.id and b.id=57;

here, you might specify either a or b to delete the corresponding table rows

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.