0

I have a system that saves statistics for servers in a network. Later a user is able to consume all the data and plan their growth. Thus it is important to summarize the data into a graph ie across an hour, day, week, year, etc.

I'm trying to do something like this:

select created_time / 60, count(*)
from pm_server_stat
group by (created_time / 60);

--with this index
CREATE INDEX pm_server_stat_created_time_60
  ON pm_server_stat
  USING btree
  ((created_time / 60));

This is the explain i get

"GroupAggregate  (cost=189822.36..213951.06 rows=1206435 width=8)"
"  Output: ((created_time / 60)), count(*)"
"  ->  Sort  (cost=189822.36..192838.45 rows=1206435 width=8)"
"        Output: created_time, ((created_time / 60))"
"        Sort Key: ((pm_server_stat.created_time / 60))"
"        ->  Seq Scan on public.pm_server_stat  (cost=0.00..34967.44 rows=1206435 width=8)"
"              Output: created_time, (created_time / 60)"

Does anyone know why this happens? I suspect that the types might be different?

1 Answer 1

2

PostgreSQL doesn't have "covering" indexes in 9.1 or before. That means it's going to have to access the rows anyway, in which case it might as well scan them. They're due to appear in 9.2 (currently in beta testing if you want to try it out) but I'm not sure they'd be smart enough for this anyway.

It'll never work once you want "total files served" or "total packets transmitted" anyway.

Typically, for this sort of summarizing task you'd have one or more summary tables: stats_minute, stats_hour, stats_day, stats_week etc. How many you'd have would depend on total data size / performance requirements. Keep the summaries up to date with a simple cron-job. If data is going to be coming in with "late" timestamps you might need a slight lag or allow for recalculation.

Then you can just have a union of the summary table with an actual sum of all the rows since the start of the current hour. That's much less data to query and can be as fast as you might need.

Sign up to request clarification or add additional context in comments.

1 Comment

Yea... basically you are saying "run a data warehouse" aka star-schema. You are right... my intention was to do a "sum(ingress), sum(egress)"

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.