1

I have a table with 3 columns: id, date and name. What I am looking for is to delete the records that have a duplicate name. The rule should be to keep the record that has the oldest date. For instance in the example below, there is 3 records with the name Paul. So I would like to keep the one that has the oldest date (id=1) and remove all the others (id = 4 and 6). I know how to make insert, update, etc queries, but here I do not see how to make the trick work.

id, date, name

1, 2012-03-10, Paul
2, 2012-03-10, James
4, 2012-03-12, Paul
5, 2012-03-11, Ricardo
6, 2012-03-13, Paul

mysql_query(?);

6 Answers 6

1

The best suggestion I can give you is create a unique index on name and avoid all the trouble.

Follow the steps as Peter Kiss said from 2 to 3. Then do this

ALTER Table tablename ADD UNIQUE INDEX name (name)

Then Follow 4 Insert everything from the temporary table to the original.

All the new duplicate rows, will be omitted

Sign up to request clarification or add additional context in comments.

2 Comments

Hello Starx. Just had time to test what you are proposing this morning. I like it because it is simple. The thing is is, when runing an insert query, is there a way to know if the record has not been inserted because the value already existed in the table. I tried mysql_error, but nothing is retrieved in that case...
@Marc, You can check mysql's insert on duplicate
1
  1. Select all the records what you want to keep
  2. Insert them to a temporary table
  3. Delete everything from the original table
  4. Insert everything from the temporary table to the original

2 Comments

Hello Peter? Thanks for the reply. What do you think about the solution proposed by pritaeas?
The approach is good, the solution is not. First, you have to create a query with the ORDER BY clause and then can you have a SELECT with GROUP BY statement: SELECT id FROM (SELECT id, name FROM table ORDER BY date) GROUP BY name. In this case the GROUp BY will keep the oldest rows from the result set.
1

Like Matt, but without the join:

DELETE FROM `table` WHERE `id` NOT IN (
    SELECT `id` FROM (
        SELECT `id` FROM `table` GROUP BY `name` ORDER BY `date`
    ) as A 
)

Without the first SELECT you will get "You can't specify target table 'table' for update in FROM clause"

9 Comments

Hello pritaeas. Thank for trying to help me out. What does the 'as A' mean?
@Marc A is a name for the inner query
@pritaeas Surely that will return all ids (And thus delete none), you need a top 1 or similar don't you?
The group/order by selects the oldest one per name. It worked with the test data he provided, records 1, 2 and 4 remained.
my mysql is totally overloaded at the moment. I will not be abble to test this out for a moment. When I'll be abble to and will test that straight ahead and revert to this post. Thanks to everyone for the moment...
|
1

Something like this would work:

DELETE FROM tablename WHERE id NOT IN (
    SELECT tablename.id FROM (
        SELECT MIN(date) as dateCol, name FROM tablename GROUP BY name /*select the minimum date and name, for each name*/
    ) as MyInnerQuery 
    INNER JOIN tablename on MyInnerQuery.dateCol = tablename.date 
        and MyInnerQuery.name = tablename.name /*select the id joined on the minimum date and the name*/
) /*Delete everything which isn't in the list of ids which are the minimum date fore each name*/

2 Comments

Hello Matt. Thanks. I will need a little time to test that out... I'll come back
Hello Matt. Had time to test this out this morning and it is unfortunately not working. Nevertheless I get the logic of your solution, but do not get why it is not working....
0
DELETE t
FROM tableX AS t
  LEFT JOIN
    ( SELECT name
           , MIN(date) AS first_date
      FROM tableX
      GROUP BY name
    ) AS grp
    ON  grp.name = t.name
    AND grp.first_date = t.date
WHERE
    grp.name IS NULL

Comments

0
DELETE FROM thetable tt
WHERE EXISTS (
    SELECT *
    FROM thetable tx
    WHERE tx.thename = tt.thename
    AND tx.thedate > tt. thedate
    );

(note that "date" is a reserver word (type) in SQL, "and" name is a reserved word in some SQL implementations)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.