2

I have a table PEOPLE, with columns 'firstName' 'lastName' (varchars) and 'deleted' (bit) amongst others.

I want to delete from this table, entries that have the property TRUE for deleted, but only if they share their exact firstName and lastName with another, separate, entry in the table.

In other words, remove from the table 'deleted' people, but only if they are a duplicate.

Not sure how to do this, and especially not how to do it quickly. Any help is appreciated, thanks.

4
  • How can you tell which is duplicate and which is primary record? Commented Jun 27, 2011 at 17:32
  • possible duplicate of Duplicate Entries in DB Commented Jun 27, 2011 at 17:34
  • All fields are in the same table? Commented Jun 27, 2011 at 17:34
  • @Antonio: All fields are in the same table Commented Jun 27, 2011 at 17:39

3 Answers 3

3
DELETE FROM people
WHERE EXISTS (
    SELECT *
    FROM people p2
    WHERE people.firstName = p2.firstName AND people.lastName = p2.lastName
    GROUP BY firstName, lastName
    HAVING COUNT(*)>1
)
AND deleted = 1 -- True
Sign up to request clarification or add additional context in comments.

9 Comments

do you need the where clause in the nested statement?
Yes, to join the subquery with the outer table.
@niktrs, this will take a loooong time is the table is huge.
won't this query delete all entries that have deleted = 1?
We are asked "I want to delete from this table, entries that have the property TRUE for deleted, but only if they share their exact firstName and lastName with another", so we want deleted=1 and every lastName, firstName count > 1
|
1

If your table has a unique primary key (... will depend on design...), then this is a viable alternative to needing to count the occurrances of entries:

DELETE FROM people as A
WHERE deleted = 1
AND EXISTS (SELECT '1'
            FROM people as B
            WHERE B.id <> A.id
            AND A.firstName = B.firstName
            AND A.lastName = B.lastName)

This may have slightly better performance than counting rows. Please note that this query will likely suffer the same possible issue present in the previous answer; specifically, if there are two or more 'deleted' rows, and no 'non-deleted', both of them will probably be removed (leaving you with no rows!). If the intent of the query is only to remove 'deleted' rows when there is a 'non-deleted' equivalent row, add AND B.deleted = 0 as part of the inner WHERE clause.

3 Comments

Great -- this one allows an easy AND B.deleted = 0 fix that I suspect the questioner wants, where the other doesn't.
Suggestion: Use A.id>B.id, so everything newer than the first record will be deleted. Also performs faster.
@niktrs - Unfortunately, that presumes that only later (or earlier) ids are ever 'deleted'. Depending on design and use, this assumption may or may not be valid. But yes, otherwise, that would likely perform better.
0

Here is a rudimentary way of doing it:

http://www.justin-cook.com/wp/2006/12/12/remove-duplicate-entries-rows-a-mysql-database-table/

Basically:
1. Create a new table with GROUP BY.
2. Delete old table.
3. Rename new table.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.