12

I have a table in a PostgreSQL 9.2 DB, created and filled as follows:

CREATE TABLE foo( id integer, date date );

INSERT INTO foo
SELECT (id % 10) + 1, now() - (id % 50) * interval '1 day'
FROM generate_series(1, 100000) AS id;

Now, I need to find all pairs (id, date) such that the date is a maximum one among all pairs with the same id. The query is kind of well known and it's common to use the window function called ROW_NUMBER()

SELECT id, date
FROM (
    SELECT id, date, ROW_NUMBER() OVER (PARTITION BY id ORDER BY date DESC) rn
    FROM foo
) sbt
WHERE sbt.rn = 1;

Now, I ask for the plan of that query and figured out that the WindowAgg node requires a table to be sorted first.

Subquery Scan on sbt  (cost=11116.32..14366.32 rows=500 width=8) (actual time=71.650..127.809 rows=10 loops=1)
  Filter: (sbt.rn = 1)
  Rows Removed by Filter: 99990
    ->  WindowAgg  (cost=11116.32..13116.32 rows=100000 width=8) (actual time=71.644..122.476 rows=100000 loops=1)
          ->  Sort  (cost=11116.32..11366.32 rows=100000 width=8) (actual time=71.637..92.081 rows=100000 loops=1)
                Sort Key: foo.id, foo.date
                Sort Method: external merge  Disk: 1752kB
                ->  Seq Scan on foo  (cost=0.00..1443.00 rows=100000 width=8) (actual time=0.006..6.138 rows=100000 loops=1)

As expected, the sorting took the majority of the query execution time and it would definitely helpful to use indexes.

So I create the CREATE INDEX ON foo(id, date) and expect that now it'll be use indexes. But it doesn't. I got the same plan with the external merge and even turning off the sequential scan wasn't useful. I just ended up with Bitmap Index Scan

->  Sort  (cost=12745.58..12995.58 rows=100000 width=8) (actual time=69.247..90.003 rows=100000 loops=1)
      Sort Key: foo.id, foo.date
      Sort Method: external merge  Disk: 1752kB
      ->  Bitmap Heap Scan on foo  (cost=1629.26..3072.26 rows=100000 width=8) (actual time=5.359..12.639 rows=100000 loops=1)
            ->  Bitmap Index Scan on foo_id_date_idx  (cost=0.00..1604.26 rows=100000 width=0) (actual time=5.299..5.299 rows=100000 loops=1)

QUESTION
Can WindowAgg ever use indexes for sorting? I think that it can't but don't understand why... GroupAggreagtes can and it gives good performance improving.

2
  • Your sorting takes place on disk. Try to increase the work_mem for your session and see if that improves things Commented Oct 14, 2015 at 6:39
  • @a_horse_with_no_name Yes, I know that improving work_mem switches the sorting algorithm to quick sort if the memory is enought to do that. But why can't we use sorting provided with the index on foo(id, date) along with window functions? With aggregate ones we can. Commented Oct 14, 2015 at 6:42

2 Answers 2

10

To match the index you created:

CREATE INDEX ON foo(id, date)

you would have to make that:

ROW_NUMBER() OVER (PARTITION BY id ORDER BY date DESC NULLS LAST) 

which is the perfect reverse order of ASC.

That aside, you could just run:

SELECT DISTINCT ON (id)
       id, date
FROM   foo
ORDER  BY id, date DESC NULLS LAST;

But that's probably not what you wanted to ask. Either way, I would make the index:

CREATE INDEX ON foo(id, date DESC NULLS LAST)

so that max(date) is the first index entry per id. Related:

Sign up to request clarification or add additional context in comments.

4 Comments

Unfortunately, it doesn't work for me. Are you sure about that or maybe I did something wrong...?
I mean (PARTITION BY id ORDER BY date ASC NULLS LAST)
Awesom, you're just wizard!!! I recreated the index as CREATE INDEX ON foo(id, date DESC NULLS LAST) and it works completely fine (performance improved twice). But couldn't you add a little explanation or some references to read more about why creating indexes as I did doesn't work with window aggregates.
@St.Antario: I added some related answers with a lot more details and links.
1

You can rewrite the logic "the date is a maximum one among all pairs with the same id" to a direct translation:

SELECT id, date
FROM (
    SELECT id, date, MAX(date) OVER (PARTITION BY id) as maxDate
    FROM foo
) sbt
WHERE date = maxDate;

It's not exactly the same as a ROW_NUMBER, but a RANK, it might return multiple rows with the same date

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.