I just noticed Postgres (checked on version 13 and 14) behavior that surprised me. I have a simple table volume with id and unique text column name. Second table dir has 3 columns: id, volume_id and path. This table is partitioned by hash on volume_id column. Here is full table schema with sample data:
CREATE TABLE dir (
id BIGSERIAL,
volume_id BIGINT,
path TEXT
) PARTITION BY HASH (volume_id);
CREATE TABLE dir_0
PARTITION OF dir FOR VALUES WITH (modulus 3, remainder 0);
CREATE TABLE dir_1
PARTITION OF dir FOR VALUES WITH (modulus 3, remainder 1);
CREATE TABLE dir_2
PARTITION OF dir FOR VALUES WITH (modulus 3, remainder 2);
CREATE TABLE volume(
id BIGINT,
name TEXT UNIQUE
);
INSERT INTO volume (id, name) VALUES (1, 'vol1'), (2, 'vol2'), (3, 'vol3');
INSERT INTO dir (volume_id, path) SELECT i % 3 + 1, 'path_' || i FROM generate_series(1,1000) AS i;
Now, given the volume name, I need to find all the rows from the dir table on that volume. I can do that in 2 different ways.
Query #1
EXPLAIN ANALYZE
SELECT * FROM dir AS d
INNER JOIN volume AS v ON d.volume_id = v.id
WHERE v.name = 'vol1';
Which produces query plan:
| QUERY PLAN |
|---|
| Hash Join (cost=1.05..31.38 rows=333 width=37) (actual time=0.186..0.302 rows=333 loops=1) |
| Hash Cond: (d.volume_id = v.id) |
| -> Append (cost=0.00..24.00 rows=1000 width=24) (actual time=0.006..0.154 rows=1000 loops=1) |
| -> Seq Scan on dir_0 d_1 (cost=0.00..6.34 rows=334 width=24) (actual time=0.006..0.032 rows=334 loops=1) |
| -> Seq Scan on dir_1 d_2 (cost=0.00..6.33 rows=333 width=24) (actual time=0.006..0.029 rows=333 loops=1) |
| -> Seq Scan on dir_2 d_3 (cost=0.00..6.33 rows=333 width=24) (actual time=0.004..0.026 rows=333 loops=1) |
| -> Hash (cost=1.04..1.04 rows=1 width=13) (actual time=0.007..0.007 rows=1 loops=1) |
| Buckets: 1024 Batches: 1 Memory Usage: 9kB |
| -> Seq Scan on volume v (cost=0.00..1.04 rows=1 width=13) (actual time=0.003..0.004 rows=1 loops=1) |
| Filter: (name = 'vol1'::text) |
| Rows Removed by Filter: 2 |
| Planning Time: 0.500 ms |
| Execution Time: 0.364 ms |
As you can see this query leads to a sequential scan on all 3 partitions of the dir table.
Alternatively, we can write this query like this:
Query #2
EXPLAIN ANALYZE
SELECT * FROM dir AS d
WHERE volume_id = (SELECT id FROM volume AS v WHERE v.name = 'vol1');
In that case we get following query plan:
| QUERY PLAN |
|---|
| Append (cost=1.04..27.54 rows=1000 width=24) (actual time=0.010..0.066 rows=333 loops=1) |
| InitPlan 1 (returns $0) |
| -> Seq Scan on volume v (cost=0.00..1.04 rows=1 width=8) (actual time=0.003..0.004 rows=1 loops=1) |
| Filter: (name = 'vol1'::text) |
| Rows Removed by Filter: 2 |
| -> Seq Scan on dir_0 d_1 (cost=0.00..7.17 rows=334 width=24) (never executed) |
| Filter: (volume_id = $0) |
| -> Seq Scan on dir_1 d_2 (cost=0.00..7.16 rows=333 width=24) (never executed) |
| Filter: (volume_id = $0) |
| -> Seq Scan on dir_2 d_3 (cost=0.00..7.16 rows=333 width=24) (actual time=0.004..0.037 rows=333 loops=1) |
| Filter: (volume_id = $0) |
| Planning Time: 0.063 ms |
| Execution Time: 0.093 ms |
Here we can see that partitions dir_0 and dir_1 have never executed annotation.
My question is:
Why in the first case is there no partition pruning? Postgres already knows that the volume.name column is unique and that it will translate into a single volume_id. I would like to get a good intuition on when partition pruning can happen during query execution.