80

I am using count and group by to get the number of subscribers registered each day:

  SELECT created_at, COUNT(email)  
    FROM subscriptions 
GROUP BY created at;

Result:

created_at  count
-----------------
04-04-2011  100
05-04-2011   50
06-04-2011   50
07-04-2011  300

I want to get the cumulative total of subscribers every day instead. How do I get this?

created_at  count
-----------------
04-04-2011  100
05-04-2011  150
06-04-2011  200
07-04-2011  500

6 Answers 6

133

With larger datasets, window functions are the most efficient way to perform these kinds of queries -- the table will be scanned only once, instead of once for each date, like a self-join would do. It also looks a lot simpler. :) PostgreSQL 8.4 and up have support for window functions.

This is what it looks like:

SELECT created_at, sum(count(email)) OVER (ORDER BY created_at)
FROM subscriptions
GROUP BY created_at;

Here OVER creates the window; ORDER BY created_at means that it has to sum up the counts in created_at order.


Edit: If you want to remove duplicate emails within a single day, you can use sum(count(distinct email)). Unfortunately this won't remove duplicates that cross different dates.

If you want to remove all duplicates, I think the easiest is to use a subquery and DISTINCT ON. This will attribute emails to their earliest date (because I'm sorting by created_at in ascending order, it'll choose the earliest one):

SELECT created_at, sum(count(email)) OVER (ORDER BY created_at)
FROM (
    SELECT DISTINCT ON (email) created_at, email
    FROM subscriptions ORDER BY email, created_at
) AS subq
GROUP BY created_at;

If you create an index on (email, created_at), this query shouldn't be too slow either.


(If you want to test, this is how I created the sample dataset)

create table subscriptions as
   select date '2000-04-04' + (i/10000)::int as created_at,
          '[email protected]' || (i%700000)::text as email
   from generate_series(1,1000000) i;
create index on subscriptions (email, created_at);
Sign up to request clarification or add additional context in comments.

6 Comments

This is great intgr, only that my subscriptions table contains a lot of duplicate email rows. So what over is doing is sum-ming the count numbers, but I still need to recalculate the unique emails on every subsequent date.
I updated my answer with a DISTINCT ON subquery. It's still a lot faster than Andriy's answer -- can process a million rows within a few seconds -- but perhaps more complicated.
Nice tip on the generate_series function!
Note that DISTINCT ON can also be turned into an equivalent query with GROUP BY; in this case, SELECT email, MIN(created_at) as created_at FROM subscriptions GROUP BY email. Which is more efficient will probably vary, although ready-sorted sub-query from the DISTINCT ON seems to give some advantage to the sort needed by the Window function.
I'd like to have this on a per month basis, how would I need to change this query? I'm having real problems with that.
|
9

Use:

SELECT a.created_at,
       (SELECT COUNT(b.email)
          FROM SUBSCRIPTIONS b
         WHERE b.created_at <= a.created_at) AS count
  FROM SUBSCRIPTIONS a

Comments

6

To anyone seeing this answer today (2021) You can use rollup

SELECT created_at, COUNT(email)  
    FROM subscriptions 
GROUP BY rollup(created_at);

this will give you a new row with the total

created_at  count
-----------------
04-04-2011  100
05-04-2011   50
06-04-2011   50
07-04-2011  300
NULL        500

You can also use rollup for partial results if you have more than one parameter to show in your group by. If you have a created_by for instance:

SELECT created_at, created_by COUNT(email)  
    FROM subscriptions 
GROUP BY rollup(created_at, created_by);

this will give you a new row with the total

created_at  created_by  count
-----------------------------
04-04-2011     1        80
04-04-2011     2        20
04-04-2021    NULL      100
05-04-2011     1        20
05-04-2011     2        30
05-04-2011    NULL      50
NULL          NULL      150

I only took the numbers of the first two days, but that's the idea. it will show grouped by date, then total of that day, then the total of totals.

Order matters in the rollup() here, as to how the partial totals will be displayed

3 Comments

This should really be the accepted answer in 2021+ ! On my system: Window function (Planning Time: 1.134 ms, Execution Time: 1.045 ms) , Rollup (Planning Time: 0.245 ms,Execution Time: 0.642 ms). Rollup is significantly more performant.
This isn't cumulative, as the OP requests.
rollup doesn't work for me.
2
SELECT
  s1.created_at,
  COUNT(s2.email) AS cumul_count
FROM subscriptions s1
  INNER JOIN subscriptions s2 ON s1.created_at >= s2.created_at
GROUP BY s1.created_at

3 Comments

I've tried the sum(s2.count) and the console gives an error: 'aggregate function calls cannot be nested'
I meant that to be COUNT(s2.email), sorry. Please see my edited solution.
Thanks buddy! I was working with a more complicated query, and your structure is easy to understand (and thus, to implement).
2

I assume you want only one row per day and you want to still show days without any subscriptions (suppose nobody subscribes for a certain date, do you want to show that date with the balance of the previous day?). If this is the case, you can use the 'with' feature:

with recursive serialdates(adate) as (
    select cast('2011-04-04' as date)
    union all
    select adate + 1 from serialdates where adate < cast('2011-04-07' as date)
)
select D.adate,
(
    select count(distinct email)
    from subscriptions
    where created_at between date_trunc('month', D.adate) and D.adate
)
from serialdates D

2 Comments

Thanks, that with function could be useful too. Learnt something new.
Instead of serialdates you can use the built-in function: generate_series(timestamp '2011-04-04', timestamp '2011-04-07', interval '1 day')
-3

The best way is to have a calendar table: calendar ( date date, month int, quarter int, half int, week int, year int )

Then, you can join this table to make summary for the field you need.

1 Comment

That has nothing to do with getting a running total.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.