1

Following my previous question, I am now trying to remove duplicates from my database. I am first running a sub-query to identify the almost identical records (the only difference would be the index column "id"). My table has roughly 9 million records and the below code had to be interrupted after roughly 1h30

DELETE FROM public."OptionsData" 
WHERE id NOT IN
(
    SELECT id FROM (
        SELECT DISTINCT ON (asofdate, contract, strike, expiry, type, last, bid, ask, volume, iv, moneyness, underlying, underlyingprice) * FROM public."OptionsData"
    ) AS TempTable
);  

Producing the results from the sub-query takes about 1 minute, so maybe running the full query might take a long time (?) or is there something off in my code please?

2
  • How many duplicates do you have? Commented Aug 24, 2020 at 13:15
  • Total number of records: 8'764'239 The sub-query indicates: 8'681'440 unique records So 82'799 duplicates Humm then assuming 1 sec to delete a records, that would require almost 23hours (?) Commented Aug 24, 2020 at 13:21

1 Answer 1

3

NOT IN combined with a DISTINCT is usually quite slow.

To delete duplicates using EXISTS is typically faster:

DELETE FROM public."OptionsData"  d1
WHERE EXISTS (select *
              from public."OptionsData" d2
              where d1.id > d2.id
                and (d1.asofdate, d1.contract, d1.strike, d1.expiry, d1.type, d1.last, d1.bid, d1.ask, d1.volume, d1.iv, d1.moneyness, d1.underlying, d1.underlyingprice) 
                    = (d2.asofdate, d2.contract, d2.strike, d2.expiry, d2.type, d2.last, d2.bid, d2.ask, d2.volume, d2.iv, d2.moneyness, d2.underlying, d2.underlyingprice)
              )

This will keep the rows with the smallest value in id. If you want to keep those with the highest id use where d1.id < d2.id.

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks, EXISTS is definitely faster!!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.