0

I have to clean a database with a lot of orphaned entries, on this case i have a table 3 tables

  • 'Email' (69529 entries)
  • 'ServiceHasEmail' (5782 entries)
  • 'UserHasEmail' (26254 entries)

The two last tables reference the 'Email' table, so this table should have 26254 + 5782 (32036) entries.

I built a query to select all the entries on the 'Email' table which are not referenced on the UserHasEmail and ServiceHasEmail tables:

SELECT * FROM Email e 
WHERE e.EML_Id NOT IN (SELECT EML_Id FROM ServiceHasEmail) 
   AND e.EML_Id NOT IN (SELECT EML_Id FROM UserHasEmail)

But this query returns me 40383 entries in place of 37493 (69529 - (26254 + 5782))

What i am missing here?

3
  • 2
    Maybe some emails appear in both tables Commented Jan 19, 2012 at 13:50
  • Are you sure some email addresses don't appear in both ServiceHasEmail and UserHasEmail? That would result in more rows than your expected amount. Commented Jan 19, 2012 at 13:51
  • I guess 2890 EML_id are in ServiceHasEmail AND in UserHasEmail: Try SELECT count(*) FROM ServiceHasEmail INNER JOIN UserHasEmail ON ServiceHasEmail.EML_Id=UserHasEmail.EML_Id Commented Jan 19, 2012 at 13:51

4 Answers 4

1

This can be because

  • tables ServiceHasEmail and UserHasEmail contains some emails both.
  • tables ServiceHasEmail and UserHasEmail contains duplicates.

You can verify:

select count(distinct email) from Email 

select count(distinct email) from ServiceHasEmail

select count(distinct email) from UserHasEmail

and

select count(distinct Email) 
from
(select Email from ServiceHasEmail
 union all
 select Email from UserHasEmail
)

And your query should be

SELECT count(distinct Email) 
FROM Email e 
WHERE e.EML_Id NOT IN (SELECT EML_Id FROM ServiceHasEmail) 
   AND e.EML_Id NOT IN (SELECT EML_Id FROM UserHasEmail)
Sign up to request clarification or add additional context in comments.

2 Comments

It could also be due to NULL values, since NULL won't evaluate TRUE when compared to a NOT IN.
Thanks! for some reasons, some EML_Id indexes were duplicated in the ServiceHasEmail table although they shouldn't be. Looks like it will be more complicated than i expected to clean this database...
0

Looks like some ServiceHasEmail and UserHasEmail reference the same Email.

Comments

0

You could have EML_Id's, that are present both in ServiceHasEmail and UserHasEmail.

I suppose you have exactely 2,890 of them. Please try

SELECT * FROM `ServiceHasEmail` INNER JOIN `UserHasEmail` USING(`EML_Id`)

to verify this.

Comments

0

you can use the following query:

SELECT * FROM Email e WHERE e.EML_Id NOT IN (SELECT EML_Id FROM ServiceHasEmail UNION SELECT EML_Id FROM UserHasEmail)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.