0

The table I have is

id    user_id    date        created_at
1     123     2020-02-02  2020-02-02 10:00:00
2     123     2020-02-02  2020-02-02 10:00:01
3     789     2020-02-12  2020-02-12 12:00:00
4     456     2020-02-10  2020-02-10 10:00:00
5     456     2020-02-10  2020-02-10 10:00:01

I want to delete duplicate entries and I want the desired output -

id    user_id    date        created_at
1     123     2020-02-02  2020-02-02 10:00:00
3     789     2020-02-12  2020-02-12 12:00:00
4     456     2020-02-10  2020-02-10 10:00:00

I tried the following query -

DELETE
    `a`
FROM
    `table1` AS `a`,
    `table1` AS `b`
WHERE
    `a`.`id` < `b`.`id` AND `a`.`user_id` <=> `b`.`user_id`

But it's taking too long and the error I get is

Lock wait timeout exceeded; try restarting transaction

The table I have has more than 9500000 entries.

What could be a better alternative query?

5
  • What is the criteria which allows to decide what row form a lot rows with the same user_id must be stored and what rows must be deleted? least id? Commented Feb 27, 2020 at 11:03
  • Yes, the least ID. Commented Feb 27, 2020 at 11:04
  • @rik_maz you can check my answer Commented Feb 27, 2020 at 11:24
  • It's often quicker to create a new table, retaining just the rows you want to keep, an then replacing the old table with the new one. Commented Feb 27, 2020 at 13:52
  • Possible duplicate of stackoverflow.com/questions/18932/… Commented Feb 28, 2020 at 7:19

2 Answers 2

1

You have lots of records hence it is giving timeout error. Perform this operation in smaller chunk. Try below query

DELETE
        `a`
    FROM
        `table1` AS `a` inner join
        `table1` AS `b`
    WHERE
        `a`.`id` < `b`.`id` AND `a`.`user_id` == `b`.`user_id` and `a`.`id` >***10000(Increase this)***

Hope this will helpful for you.

Sign up to request clarification or add additional context in comments.

Comments

0

Try to use correlated subquery:

DELETE t1
FROM table1 t1
WHERE EXISTS ( SELECT NULL
               FROM table1 t2
               WHERE t1.user_id = t2.user_id
                 AND t1.id > t2.id )

The index (user_id, id) will increase the query speed.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.