1

I have a query (with a subquery) that calculates an average of temperatures over the previous years, plus/minus one week per each day. It works, but it is not all that fast. The time series values below are just an example. Why I'm using doy is because I want a sliding window around the same date for every year.

SELECT days,
    (SELECT avg(temperature)
     FROM temperatures
     WHERE site_id = ? AND
      extract(doy FROM timestamp) BETWEEN
      extract(doy FROM days) - 7 AND extract(doy FROM days) + 7
    ) AS temperature
FROM generate_series('2017-05-01'::date, '2017-08-31'::date, interval '1 day') days

So my question is, could this query somehow be improved? I was thinking about using some kind of window function or possibly lag and lead. However at least regular window functions only work on specific amount of rows, whereas there can be any number of measurements within the two-week window.

I can live with what I have for now, but as the amount of data grows so does the execution speed of the query. The two latter extracts could be removed, but that has no noticeable speed improvement and only makes the query less legible. Any help would be greatly appreciated.

1
  • Search for the term "sargable" and I suggest providing an explain plan for your existing query. Commented May 23, 2017 at 23:52

1 Answer 1

1

The best index for your original query is

create index idx_temperatures_site_id_timestamp_doy
  on temperatures(site_id, extract(doy from timestamp));

This can greatly improve your original query's performance.

While your original query is simple & readable, it has 1 flaw: it will calculate every day's average 14 times (on average). Instead, you could calculate these averages on a daily basis & calculate the 2 week window's weighted average (the weight for a day's average needs to be count of the individual rows in your original table). Something like this:

with p as (
  select timestamp '2017-05-01' min,
         timestamp '2017-08-31' max
)
select     t.*
from       p
cross join (select     days, sum(sum(temperature)) over pn1week / sum(count(temperature)) over pn1week
            from       p
            cross join generate_series(min - interval '1 week', max + interval '1 week', interval '1 day') days
            left join  temperatures on site_id = ? and extract(doy from timestamp) = extract(doy from days)
            group by   days
            window     pn1week as (order by days rows between 7 preceding and 7 following)) t
where      days between min and max
order by   days

However, there is not much gain here, as this is only twice as fast as your original query (with the optimal index).

http://rextester.com/JCAG41071

Notes: I used timestamp because I assumed your column's type is timestamp. But as it turned out, you use timestamptz (aka. timestamp with time zone). With that type, you cannot index the extract(doy from timestamp) expression, because that expression's output is dependent of the actual client's time zone setting.

For timestamptz use an index which (at least) starts with site_id. Using the window version should improve the performance anyway.

http://rextester.com/XTJSM42954

Sign up to request clarification or add additional context in comments.

4 Comments

An interesting approach, and certainly much faster than my original one. My initial attempt was indeed indexing the table on doy but that does not work because apparently extract doy is not immutable. In any case, this works much much faster with the data I have.
@TeemuKarimerto that's because your column is actually timestamptz. Please see my edits (at notes).
Ah yes, that seems to be the issue with the indexing. I would prefer to use timestamp but these are all Django-generated tables and I'm not entirely sure how I ought to go about possibly converting the values in the database AND configuring Django so nothing breaks :D
It seems like trying to force Django to use timestamps without time zones is a bad idea. So I'm just going to skip the doy-based indexing and go with this query as it is certainly much faster than my original one.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.