postgresql group count distinct using fast way

Question

I got table T with 2 column as follow example:

C1      C2
----------
A       x
A       x
A       y
B       x
B       x

I want to count number of distinct C1 for each value in C2. This result should be like:

C1      distinct count
----------------------
A       2               // count distinct x,x,y = 2
B       1               // count distinct x,x = 1

it is easy to come out with a SQL query like this

select C1, count(distinct C2) from T group by C1

however, as discussed in postgresql COUNT(DISTINCT …) very slow, this query yield poor performance. I would like to use the improved query (count (*) (select distinct ...)) as suggested in that article but I don't know how to form the query with group by.

Adrian Hartanto · Accepted Answer · 2017-07-07 07:31:39Z

6

Try this query if you want avoid DISTINCT keyword

Sample Data:

stackoverflow=# select * from T;
 c1 | c2 
----+----
 A  | x
 A  | x
 A  | y
 B  | x
 B  | x
(5 rows)

Query:

stackoverflow=# WITH count_distinct as (SELECT C1 FROM T GROUP BY c1,c2)
SELECT c1,count(c1) FROM count_distinct GROUP BY C1;  --updated missing group by

Output:

 c1 | count 
----+-------
 B  |     1
 A  |     2
(2 rows)

Same output, but you should try the performance first.

edited Jul 7, 2017 at 7:31

answered Jul 7, 2017 at 4:51

Adrian Hartanto

4653 silver badges8 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Oto Shavadze Over a year ago

You missed GROUP BY c1 for second query

Collectives™ on Stack Overflow

postgresql group count distinct using fast way

1 Answer 1

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related