1

I have a table called answers with columns created_at and response, response being an integer 0 (for 'no'), 1 (for 'yes'), or 2 (for 'don't know'). I want to get a moving average for the response values, filtering out 2s for each day, only taking in to account the previous 30 days. I know you can do ROWS BETWEEN 29 AND PRECEDING AND CURRENT ROW but that only works if you have data for each day, and in my case there might be no data for a week or more.

My current query is this:

SELECT answers.created_at, answers.response,
    AVG(answers.response)
      OVER(ORDER BY answers.created_at::date ROWS 
        BETWEEN 29 PRECEDING AND CURRENT ROW) AS rolling_average
  FROM answers
  WHERE answers.user_id = 'insert_user_id'' 
    AND (answers.response = 0 OR answers.response = 1)
  GROUP BY answers.created_at, answers.response
  ORDER BY answers.created_at::date

But this will return an average based on the previous rows, if a user responded with a 1 on 2018-3-30 and a 0 on 2018-5-15, the rolling average on 2018-5-15 would be 0.5 instead of 0 as I want. How can I create a query that will only take in to account the responses that were created within the last 30 days for the rolling average?

3
  • Provide example data on sqlfiddle.com and add expected results based on the example data.. Commented Apr 26, 2018 at 15:16
  • 1
    Could you provide some sample data and expect result? Commented Apr 26, 2018 at 15:16
  • " I know you can do ROWS BETWEEN 29 AND PRECEDING AND CURRENT ROW but that only works if you have data for each day, and in my case there might be no data for a week or more." PostgreSQL 8.0+ supports generate_series() functions to generate a calendar table for that. Commented Apr 26, 2018 at 15:20

2 Answers 2

4

Since Postgres 11 you can do this:

SELECT created_at, 
       response,
       AVG(response) OVER (ORDER BY created_at 
                           RANGE BETWEEN '29 day' PRECEDING AND current row) AS rolling_average 
FROM answers
WHERE user_id = 1
  AND response in (0,1)
ORDER BY created_at;
Sign up to request clarification or add additional context in comments.

Comments

0

Try something like this:

SELECT * FROM ( SELECT d.created_at, d.response, Avg(d.response) OVER(ORDER BY d.created_at::date rows BETWEEN 29 PRECEDING AND CURRENT row) AS rolling_average FROM ( SELECT COALESCE(a.created_at, d.dates) AS created_at, response, a.user_id FROM (SELECT generate_series('2018-01-01'::date, '2018-05-31'::date, '1day'::interval)::date AS dates) d LEFT JOIN (SELECT * FROM answers WHERE answers.user_id = 'insert_user_id' AND ( answers.response = 0 OR answers.response = 1)) a ON d.dates = a.created_at::date ) d GROUP BY d.created_at, d.response ) agg WHERE agg.response IS NOT NULL ORDER BY agg.created_at::date

  • generate_series creates list of days - you have to set reasonable boundaries
  • this list of days is LEFT JOINed with preselected answers
  • this result is used for rolling average calculation
  • after it I select only records with response and I get:

created_at | response | rolling_averagte 2018-03-30 | 1 | 1.00000000000000000000 2018-05-15 | 0 | 0.00000000000000000000

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.