2

I have a Postgresql table which has time series data. I want to aggregate values over configurable periods and durations e.g. aggregate tag-n over 30-minute intervals for one day. The output should look like -

Period Max Min Avg
00:00 xx xx xx
00:30 xx xx xx
01:00 xx xx xx
... ... ... ...
23:30 xx xx xx

To do this, I created a function to generate the custom intervals and then used this function in the query -

create or replace function interval_generator(start_ts timestamp, end_ts timestamp, round_interval INTERVAL) 
returns TABLE(row_seq bigint,start_time timestamp,end_time timestamp) as $$ 
BEGIN 
return query 
SELECT row_number() over () row_seq,(n) start_time,(n + round_interval) end_time 
FROM generate_series(date_trunc('minute', start_ts), end_ts, round_interval) n;
END $$ 
LANGUAGE 'plpgsql';
with intervals as (select * from interval_generator('2022-08-25 00:00:00','2022-08-26 23:59:59','30 Minute'::INTERVAL)) 
select intervals.row_seq,intervals.start_time,max("Value"),min("Value"),avg("Value") 
from public.<TableName> inner JOIN intervals on "TimeStamp" >= intervals.start_time and "TimeStamp" < intervals.end_time 
WHERE "Tag"=xxxxxxxxxx
GROUP BY intervals.row_seq,intervals.start_time 
ORDER BY intervals.row_seq asc

I needed an additional column (first one) for the rest of the application logic. Is this the best way to get the desired output in a single query? Or is splitting the query into multiple ones, one query for each interval, better?

I am in no way an expert in SQL or Postgresql, I put together the above queries using my Google skills. It works, but I am not sure if this is optimized.

1 Answer 1

1

I have always found that joining using any inequality (>, <, between) to be horribly inefficient. Under the covers it appears to do a cartesian join and then filter out values. I can't swear to that, but benchmarks have shown that if there is any way around it you should.

In this case, I think should convert each timestamp to the interval desired using date_trunc. Unfortunately there is no native date_trunc for 30 minutes, but there is a way around that this:

What is the fastest way to truncate timestamps to 5 minutes in Postgres?

Using the concept in the link above, I think you can optimize your query significantly by avoiding the generate series / cartesian:

select
  date_trunc('hour', "TimeStamp") +
    date_part('minute', "TimeStamp")::int / 30 * interval '30 min' as start_time,
  max("Value"),min("Value"),avg("Value") 
from public.<TableName> 
WHERE "Tag"=xxxxxxxxxx
GROUP BY start_time
ORDER BY start_time

If you need the row_seq, I would recommend using row_number().

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.