I have a table in PostgreSQL with ~330M rows in it, PostgreSQL itself is running on a VMWare VM with 24GB of RAM and 4 cores (Postgres version: Ubuntu 14.2-1.pgdg20.04+1). The table is quite wide (42 columns) but is mainly made up of int4 columns with a few timestamps, uuids and one float in there. The primary key of the table is a pair of int4s: SourceId and Id.
The table was copied from SQL Server, hence all the mixed-case names.
This query:
select "SourceId", count(*), min("Timestamp"), max("Timestamp")
from "Data"
group by "SourceId"
order by "SourceId" desc;
takes about 75s to return 171 rows.
On SQL Server, on the same VM host with the same 24GB of RAM and 4 cores, takes ~40s for the same query.
All timings are for the second and subsequent runs (running with a cold cache takes PostgreSQL ~7m).
There is an index on SourceId, Id covering Timestamp.
CREATE UNIQUE INDEX "IDX_Data_SourceId_Id"
ON "Data" USING btree ("SourceId", "Id")
INCLUDE ("Timestamp");
On SQL Server the indexing is slightly different, Timestamp is the clustered index and there is a non-clustered PK index on SourceId, Id, with the result that those three columns are in an index together.
What I'm trying to establish is whether PostgreSQL is going as fast as it can.
When I run the query on PostgreSQL and monitor with top I see one process hitting 100% CPU usage. When I run
explain analyze
select "SourceId", count(*), min("Timestamp"), max("Timestamp")
from "Data"
group by "SourceId"
order by "SourceId" desc;
I see 4-5 processes hitting 100% CPU usage and the execution plan says that it uses four workers:
Finalize GroupAggregate (cost=1000.63..4789670.43 rows=130 width=28) (actual time=577.661..26051.059 rows=171 loops=1)
Group Key: "SourceId"
-> Gather Merge (cost=1000.63..4789663.93 rows=520 width=28) (actual time=577.586..26050.461 rows=744 loops=1)
Workers Planned: 4
Workers Launched: 4
-> Partial GroupAggregate (cost=0.57..4788601.93 rows=130 width=28) (actual time=373.923..21446.380 rows=149 loops=5)
Group Key: "SourceId"
-> Parallel Index Only Scan Backward using "IDX_Data_SourceId_Id" on "Data" (cost=0.57..3948553.35 rows=84004728 width=12) (actual time=0.181..14320.058 rows=67217960 loops=5)
Heap Fetches: 10315
Using the pg_show_plans plugin I also have the plan for the running query:
Finalize GroupAggregate (cost=1000.63..4789670.43 rows=130 width=28)
Group Key: "SourceId"
-> Gather Merge (cost=1000.63..4789663.93 rows=520 width=28)
Workers Planned: 4
-> Partial GroupAggregate (cost=0.57..4788601.93 rows=130 width=28)
Group Key: "SourceId"
-> Parallel Index Only Scan Backward using "IDX_Data_SourceId_Id" on "Data" (cost=0.57..3948553.35 rows=84004728 width=12)
which also says it planned to use 4 workers.
There are no other queries running on the instance at all.
So, why does the explain use four cores, but the actual run does not?
Thanks, sorry it's a very long background for a very simple question.