2

I've got some MySQL tables with redundant data that I need to remove. For example:

 id email            date       data...
 1  [email protected] 2012-01-01 my_data
 2  [email protected] 2012-01-01 my_data
 3  [email protected] 2012-01-02 my_data
 4  [email protected] 2012-01-02 my_data   (redundant)
 5  [email protected] 2012-01-02 my_data

I need to DELETE the redundant rows, but I'd like to select them first. I found this on StackOverflow, but it requires the email address

SELECT * 
FROM `my_table`
WHERE `id` IN (SELECT `id` 
               FROM `my_table` 
               where `email` = '[email protected]' 
               group by `date` 
               HAVING count(*) > 1)

What query can i use like above that does not use the WHERE qualifier in the embedded query so I can do it fall all email addresses?

The query can be a SELECT query. I don't mind removing the rows manually in PHPMyAdmin.

2
  • ''''What query can i use like above that does not use the WHERE qualifier in the embedded query so I can do it fall all email addresses? '''By this do u mean all the duplicate ones and just the keep a single copy ?? Commented Oct 28, 2012 at 7:15
  • Yes, remove all the duplicate rows, but keep the original row Commented Oct 28, 2012 at 16:59

2 Answers 2

7
DELETE FROM tableName
WHERE ID NOT IN
(
    SELECT minID
    FROM
    (
        SELECT email, date, MIN(id) minID
        FROM tableNAme
        GROUP BY email, date
    ) x
)

or by using JOIN

DELETE a 
FROM tableName a
    LEFT JOIN (
            SELECT minID
            FROM (
                    SELECT email, DATE, MIN(id) minID
                    FROM tableNAme
                    GROUP BY email, DATE
                    ) y
            ) x
            ON a.ID = x.minID
WHERE x.minID IS NULL;

The following query only SELECT duplicated rows for each email and date

SELECT a.*
FROM tableName a
        LEFT JOIN 
       ( 
         SELECT minID
        FROM
        (
          SELECT email, date, MIN(id) minID
          FROM tableNAme
          GROUP BY email, date
        )y
       ) x ON a.ID = x.minID
WHERE x.minID IS NULL
Sign up to request clarification or add additional context in comments.

1 Comment

John deserves double points because he answered it perfectly and introduced me to SQL Fiddle! Awesome!
0

Another approach is to count the number of occurrences of the date column for each email address in your table:

SELECT `email`, `date`, COUNT(*) FROM `my_table` GROUP BY `date`, `email` HAVING COUNT(*) > 1

+------------------+---------------------+----------+
| email            | date                | COUNT(*) |
+------------------+---------------------+----------+
| [email protected] | 2012-01-02 00:00:00 |        2 |
+------------------+---------------------+----------+

2 Comments

But then I have to do it for every email address.
@EricCope I made this query output from your sample data, but if you have many emails it would automatically present all emails with count > 1 for every day of operation. You have to also keep in mind the limit of rows you want to delete, i.e. in the case above you have count of 2, so you can delete only one row, not two(!). So being on a very safe side, you could rewrite my query as: SELECT email, date, COUNT(*) AS 'count', CONCAT('DELETE FROM my_table WHERE email = \'',email,'\' LIMIT ',(COUNT(*) - 1),';') AS 'query' FROM my_table GROUP BY date, email HAVING COUNT(*) > 1;

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.