1

I have a table with following columns:

**name** | **start_date** | **termination_date** |  
X1 | 2019-01-01 | NULL  
X2 | 2018-05-01 | 2019-03-01 

I want to figure out how many people have been subscribed per month. Meaning that the start_date was before month X and the termination_date was either afterwards or is null.
E.g. For '2019-01-01' I want every row where the start_date was before or at '2019-01-01' and the termination_date is either after '2019-01-01' or Null

I tried it with this query:

SELECT start_date,
       sum(count(distinct name)) OVER (order by start_date)
from Table
where
    is_active = TRUE
    and (termination_date >= start_date or termination_date is null)
Group by start_date
Order by start_date;

However, when I checked some of the results by hand I saw that it was not returning the correct values but the sum() is to high. I have no idea why this happens.

Can anyone point me in the rigth direction?

3
  • I didn't get it. You want to count the records per month if the range(start, termination) lies in the month. So for x2: you want every month from 2018-05 to 2019-03? Does 2019-03 still count? For x1 you want every month from 2019-01 until now? Commented Jul 25, 2019 at 12:56
  • What data type is start_date? Commented Jul 25, 2019 at 12:57
  • both start_date and termination_date have the date data-type If I want to know how many active users I had in 2019-02 I want both rows to be counted but if I ask for actives for 2019-04 I only want 1 Commented Jul 25, 2019 at 13:00

1 Answer 1

2

demo:db<>fiddle

SELECT
    gs::date,
    COUNT(*)
FROM
    subscription,
    generate_series(                                             -- 1
        date_trunc('month', start_date),                         -- 2
        date_trunc('month', COALESCE(termination_date, now())),  -- 3
        interval '1 month'
    ) gs
GROUP BY gs::date                                                -- 4
ORDER BY 1
  1. generate_series() generates dates in a certain interval. In this case every month.
  2. The start is the first day of the start month (calculated by date_trunc()).
  3. The end is the first day of the termination month. If there is none, the current day is taken (COALESCE())
  4. This calculation is done for every record separately. So the dates can be grouped afterwards.

Of course you can format your month using to_char(): demo

Sign up to request clarification or add additional context in comments.

2 Comments

Thanks for the answer! One question though: Won't this be pretty time consuming when the database get's larger?
Of course, every query can. It depends on your data, table structure, indexes, hardware and so on. But you want to get the intermediate months, so they have to be created. This is the simplest way to do it.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.