I have a table in a PostgreSQL 9.2 DB, created and filled as follows:
CREATE TABLE foo( id integer, date date );
INSERT INTO foo
SELECT (id % 10) + 1, now() - (id % 50) * interval '1 day'
FROM generate_series(1, 100000) AS id;
Now, I need to find all pairs (id, date) such that the date is a maximum one among all pairs with the same id. The query is kind of well known and it's common to use the window function called ROW_NUMBER()
SELECT id, date
FROM (
SELECT id, date, ROW_NUMBER() OVER (PARTITION BY id ORDER BY date DESC) rn
FROM foo
) sbt
WHERE sbt.rn = 1;
Now, I ask for the plan of that query and figured out that the WindowAgg node requires a table to be sorted first.
Subquery Scan on sbt (cost=11116.32..14366.32 rows=500 width=8) (actual time=71.650..127.809 rows=10 loops=1)
Filter: (sbt.rn = 1)
Rows Removed by Filter: 99990
-> WindowAgg (cost=11116.32..13116.32 rows=100000 width=8) (actual time=71.644..122.476 rows=100000 loops=1)
-> Sort (cost=11116.32..11366.32 rows=100000 width=8) (actual time=71.637..92.081 rows=100000 loops=1)
Sort Key: foo.id, foo.date
Sort Method: external merge Disk: 1752kB
-> Seq Scan on foo (cost=0.00..1443.00 rows=100000 width=8) (actual time=0.006..6.138 rows=100000 loops=1)
As expected, the sorting took the majority of the query execution time and it would definitely helpful to use indexes.
So I create the CREATE INDEX ON foo(id, date) and expect that now it'll be use indexes. But it doesn't. I got the same plan with the external merge and even turning off the sequential scan wasn't useful. I just ended up with Bitmap Index Scan
-> Sort (cost=12745.58..12995.58 rows=100000 width=8) (actual time=69.247..90.003 rows=100000 loops=1)
Sort Key: foo.id, foo.date
Sort Method: external merge Disk: 1752kB
-> Bitmap Heap Scan on foo (cost=1629.26..3072.26 rows=100000 width=8) (actual time=5.359..12.639 rows=100000 loops=1)
-> Bitmap Index Scan on foo_id_date_idx (cost=0.00..1604.26 rows=100000 width=0) (actual time=5.299..5.299 rows=100000 loops=1)
QUESTION
Can WindowAgg ever use indexes for sorting? I think that it can't but don't understand why... GroupAggreagtes can and it gives good performance improving.
work_memfor your session and see if that improves thingswork_memswitches the sorting algorithm toquick sortif the memory is enought to do that. But why can't we use sorting provided with the index onfoo(id, date)along with window functions? With aggregate ones we can.