4

I got table T with 2 column as follow example:

C1      C2
----------
A       x
A       x
A       y
B       x
B       x

I want to count number of distinct C1 for each value in C2. This result should be like:

C1      distinct count
----------------------
A       2               // count distinct x,x,y = 2
B       1               // count distinct x,x = 1

it is easy to come out with a SQL query like this

select C1, count(distinct C2) from T group by C1

however, as discussed in postgresql COUNT(DISTINCT …) very slow, this query yield poor performance. I would like to use the improved query (count (*) (select distinct ...)) as suggested in that article but I don't know how to form the query with group by.

1 Answer 1

6

Try this query if you want avoid DISTINCT keyword

Sample Data:

stackoverflow=# select * from T;
 c1 | c2 
----+----
 A  | x
 A  | x
 A  | y
 B  | x
 B  | x
(5 rows)

Query:

stackoverflow=# WITH count_distinct as (SELECT C1 FROM T GROUP BY c1,c2)
SELECT c1,count(c1) FROM count_distinct GROUP BY C1;  --updated missing group by

Output:

 c1 | count 
----+-------
 B  |     1
 A  |     2
(2 rows)

Same output, but you should try the performance first.

Sign up to request clarification or add additional context in comments.

1 Comment

You missed GROUP BY c1 for second query

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.