1

In my database when I run the following query, I get 1077 as output.

select count(distinct a_t1) from t1;

Then, when I run this query, I get 459.

select count(distinct a_t1) from t1 
where a_t1 in (select a_t1 from t1 join t2 using (a_t1_t2) where a_t2=0); 

The above is the same as, this query which also give 459:

select count(distinct a_t1) from t1 join t2 using (a_t1_t2) where a_t2=0

But when I run this query, I get 0 instead of 618 which I was expecting:

select count(distinct a_t1) from t1 
where a_t1 not in (select a_t1 from t1 join t2 using (a_t1_t2) where a_t2=0);

I am running PostgreSQL 9.1.5, which really might not be necessary. Please point out my mistake in the above query.

UPDATE 1: I created a new table and output the result of the subquery above into that one. Then, I ran a few queries:

select count(distinct a_t1) from t1
where a_t1 not in (select a_t1 from sub_query_table order by a_t1 limit 10);

And Hooray! now I get 10 as the answer! I was able to increase the limit until 450. After that, I started getting 0 again.

UPDATE 2:

The sub_query_table has 459 values in it. Finally, this query gives me the required answer:

select count(distinct a_t1) from t1
where a_t1 not in (select a_t1 from sub_query_table order by a_t1 limit 459);

Where as this one, gives 0 as the answer:

select count(distinct a_t1) from t1
where a_t1 not in (select a_t1 from sub_query_table);

But, why is this happening?

1 Answer 1

2

The 'NOT IN' operator works only over 'NOT NULL'. Columns with a value of null are not matched.

select count(distinct a_t1) from t1 
where a_t1 not in (select a_t1 from t1 join t2 using (a_t1_t2) where a_t2=0) OR a_t1 IS NULL;
Sign up to request clarification or add additional context in comments.

6 Comments

True, but that only accounts for 1 value of a_t1, there are 617 other values of a_t1 which I don't see in the result of my last query.
@Phani Is it possible that the distinct reduce the count? Have all entries in subselect the same value?
I'm not sure I follow what you say. The 1st query shows that there are 1077 distinct a_t1 values. Then 2nd query shows that out of 1077 values, 459 distinct values are present in the result of subquery. So, there should be 618 distinct a_t1 values that are not present in the result of the subquery. But the last query returns 0 and if I update it with OR a_t1=0, I get 1 as the result.
@Phani do you type 'OR a_t1=0' or 'OR a_t1 IS NULL'?
I am sorry. I did use OR a_t1 IS NULL.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.