Postgresql aggregate values based on date

Question

I have a table with following columns:

**name** | **start_date** | **termination_date** |  
X1 | 2019-01-01 | NULL  
X2 | 2018-05-01 | 2019-03-01

I want to figure out how many people have been subscribed per month. Meaning that the start_date was before month X and the termination_date was either afterwards or is null.
E.g. For '2019-01-01' I want every row where the start_date was before or at '2019-01-01' and the termination_date is either after '2019-01-01' or Null

I tried it with this query:

SELECT start_date,
       sum(count(distinct name)) OVER (order by start_date)
from Table
where
    is_active = TRUE
    and (termination_date >= start_date or termination_date is null)
Group by start_date
Order by start_date;

However, when I checked some of the results by hand I saw that it was not returning the correct values but the sum() is to high. I have no idea why this happens.

Can anyone point me in the rigth direction?

I didn't get it. You want to count the records per month if the range(start, termination) lies in the month. So for x2: you want every month from 2018-05 to 2019-03? Does 2019-03 still count? For x1 you want every month from 2019-01 until now? — S-Man
– S-Man, Commented Jul 25, 2019 at 12:56
both start_date and termination_date have the date data-type If I want to know how many active users I had in 2019-02 I want both rows to be counted but if I ask for actives for 2019-04 I only want 1 — Insworn
– Insworn, Commented Jul 25, 2019 at 13:00

S-Man · Accepted Answer · 2019-07-25 13:10:49Z

2

demo:db<>fiddle

SELECT
    gs::date,
    COUNT(*)
FROM
    subscription,
    generate_series(                                             -- 1
        date_trunc('month', start_date),                         -- 2
        date_trunc('month', COALESCE(termination_date, now())),  -- 3
        interval '1 month'
    ) gs
GROUP BY gs::date                                                -- 4
ORDER BY 1

generate_series() generates dates in a certain interval. In this case every month.
The start is the first day of the start month (calculated by date_trunc()).
The end is the first day of the termination month. If there is none, the current day is taken (COALESCE())
This calculation is done for every record separately. So the dates can be grouped afterwards.

Of course you can format your month using to_char(): demo

edited Jul 25, 2019 at 13:10

answered Jul 25, 2019 at 13:01

S-Man

24k9 gold badges51 silver badges78 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Insworn Over a year ago

Thanks for the answer! One question though: Won't this be pretty time consuming when the database get's larger?

S-Man Over a year ago

Of course, every query can. It depends on your data, table structure, indexes, hardware and so on. But you want to get the intermediate months, so they have to be created. This is the simplest way to do it.

Collectives™ on Stack Overflow

Postgresql aggregate values based on date

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related