Let's say we have this table with a million rows:
CREATE TABLE stuff(
foo integer,
bar integer
);
INSERT INTO stuff(foo, bar) SELECT
(random()*1000000) :: integer,
(random()*1000000) :: integer
FROM generate_series(1, 1000000);
CREATE INDEX index1 ON stuff(foo ASC, bar ASC);
CREATE INDEX index2 ON stuff(foo ASC, bar DESC);
VACUUM ANALYZE stuff;
This table backs an API endpoint where the user can query the data specifying a custom sort order, possibly on multiple columns, and fetch the data in pages of 100. When fetching a page, the user gets a cursor that encodes the necessary data to get the next page.
When fetching rows with ORDER BY foo ASC, bar ASC, the first query would be:
postgres=# EXPLAIN ANALYZE SELECT * FROM stuff ORDER BY foo ASC, bar ASC LIMIT 100;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.42..4.79 rows=100 width=8) (actual time=0.104..3.611 rows=100 loops=1)
-> Index Only Scan using index1 on stuff (cost=0.42..43679.92 rows=1000000 width=8) (actual time=0.068..1.297 rows=100 loops=1)
Heap Fetches: 100
Planning Time: 0.150 ms
Execution Time: 4.915 ms
(5 rows)
and a query with a cursor pointing to the middle of the table would be:
postgres=# EXPLAIN ANALYZE SELECT * FROM stuff WHERE (foo, bar) > (543210, 1234) ORDER BY foo ASC, bar ASC LIMIT 100;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.42..7.14 rows=100 width=8) (actual time=0.142..3.625 rows=100 loops=1)
-> Index Only Scan using index1 on stuff (cost=0.42..30730.05 rows=457498 width=8) (actual time=0.115..1.279 rows=100 loops=1)
Index Cond: (ROW(foo, bar) > ROW(543210, 1234))
Heap Fetches: 100
Planning Time: 0.325 ms
Execution Time: 4.857 ms
(6 rows)
This query is very fast because the optimizer knows how to use the WHERE clause to "jump" into the middle of the index, and only fetch 100 rows from there.
Now, if the user wants to sort by foo ASC, bar DESC, I can't think of any equivalent for WHERE (foo, bar) > (543210, 1234). We have to write the query like this:
postgres=# EXPLAIN ANALYZE SELECT * FROM stuff WHERE foo > 543210 OR (foo = 543210 AND bar < 1234) ORDER BY foo ASC, bar DESC LIMIT 100;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.42..11.61 rows=100 width=8) (actual time=226.836..230.294 rows=100 loops=1)
-> Index Only Scan using index2 on stuff (cost=0.42..51179.92 rows=457498 width=8) (actual time=226.813..227.959 rows=100 loops=1)
Filter: ((foo > 543210) OR ((foo = 543210) AND (bar < 1234)))
Rows Removed by Filter: 543630
Heap Fetches: 543730
Planning Time: 0.275 ms
Execution Time: 231.521 ms
(7 rows)
Now the query is too slow! Apparently, postgres isn't smart enough to figure out that it can "jump" into the middle of the index, and does a full index scan instead.
Is there any way to make the second query fast, like the first one? It seems to me that it should be theoretically possible.