0

Any guess why this statement is taking too long when handling 300 000 rows. Basically this query is meant for finding duplicates.

SELECT DISTINCT 
    a.Id,
    b.Id as sid
FROM 
    csv_temp a
INNER JOIN 
    csv_temp b ON a.firstname = b.firstname AND 
    a.lastname = b.lastname  AND 
   ((a.address = b.address) OR 
    (a.zip = b.zip) OR 
    (a.city = b.city AND a.state = b.state) )
WHERE 
    a.Id <> b.Id AND 
    a.status=2 AND 
    b.status=1 AND 
    a.flag !=1 AND 
    b.flag !=1

enter image description here

12
  • Try adding index for all these column combinations. Commented Feb 14, 2013 at 8:53
  • 3
    Show explain of query + table creation queries Commented Feb 14, 2013 at 8:53
  • 3
    it will better if you use millions instead of lakh . thanks Commented Feb 14, 2013 at 8:58
  • are all the columns compared indexed? Commented Feb 14, 2013 at 9:00
  • yeah i have added all indexes Commented Feb 14, 2013 at 9:08

2 Answers 2

3

ORs often seem to have poor performance, and on JOIN conditions I would expect that to be worse. Try having 3 SELECTs (one for each of the ORed conditions) and UNION the results together. Suspect the DISTINCTS are not required either if this is done:-

SELECT  
    a.Id,
    b.Id as sid
FROM 
    csv_temp a
INNER JOIN 
    csv_temp b ON a.firstname = b.firstname AND 
    a.lastname = b.lastname  AND 
   a.address = b.address
WHERE 
    a.Id <> b.Id AND 
    a.status=2 AND 
    b.status=1 AND 
    a.flag !=1 AND 
    b.flag !=1
UNION
SELECT  
    a.Id,
    b.Id as sid
FROM 
    csv_temp a
INNER JOIN 
    csv_temp b ON a.firstname = b.firstname AND 
    a.lastname = b.lastname  AND 
    a.zip = b.zip
WHERE 
    a.Id <> b.Id AND 
    a.status=2 AND 
    b.status=1 AND 
    a.flag !=1 AND 
    b.flag !=1
UNION
SELECT  
    a.Id,
    b.Id as sid
FROM 
    csv_temp a
INNER JOIN 
    csv_temp b ON a.firstname = b.firstname AND 
    a.lastname = b.lastname  AND 
    a.city = b.city AND a.state = b.state
WHERE 
    a.Id <> b.Id AND 
    a.status=2 AND 
    b.status=1 AND 
    a.flag !=1 AND 
    b.flag !=1
Sign up to request clarification or add additional context in comments.

Comments

0

Now check using the Explain Plan after adding indexes on the column which are using in compare

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.