Below I have 2 almost identical queries, only the limit is different. Nevertheless, the query plan and execution time are totally different. The first query is +300 times slower than the second one.
The problem only occurs for a small number of owner_ids. Owners with many routes (+1000), none of which has recently been edited. The table route contains 2,806,976 rows. The owner in the example has 4,510 routes.
The database is hosted on Amazon RDS on a server with 34.2 GiB memory, 4vCPU and provisioned IOPS (instance type db.m2.2xlarge).
EXPLAIN ANALYZE SELECT
id
FROM
route
WHERE
owner_id = 39127
ORDER BY
edited_date DESC
LIMIT
5
Query plan:
"Limit (cost=0.43..5648.85 rows=5 width=12) (actual time=1.046..12949.436 rows=5 loops=1)"
" -> Index Scan Backward using route_i_edited_date on route (cost=0.43..5368257.28 rows=4752 width=12) (actual time=1.042..12949.418 rows=5 loops=1)"
" Filter: (owner_id = 39127)"
" Rows Removed by Filter: 2351712"
"Total runtime: 12949.483 ms"
EXPLAIN ANALYZE SELECT
id
FROM
route
WHERE
owner_id = 39127
ORDER BY
edited_date DESC
LIMIT
15
Query plan:
"Limit (cost=13198.79..13198.83 rows=15 width=12) (actual time=37.781..37.821 rows=15 loops=1)"
" -> Sort (cost=13198.79..13210.67 rows=4752 width=12) (actual time=37.778..37.790 rows=15 loops=1)"
" Sort Key: edited_date"
" Sort Method: top-N heapsort Memory: 25kB"
" -> Index Scan using route_i_owner_id on route (cost=0.43..13082.20 rows=4752 width=12) (actual time=0.039..32.425 rows=4510 loops=1)"
" Index Cond: (owner_id = 39127)"
"Total runtime: 37.870 ms"
How can I ensure that Postgres uses the index route_i_owner_id.
I already tried the following things:
increasing statistics for edited_date and owner_id
ALTER TABLE route ALTER COLUMN owner_id SET STATISTICS 1000; ALTER TABLE route ALTER COLUMN edited_date SET STATISTICS 1000;vacuum analyse of whole database
Solved with following composite index:
CREATE INDEX route_i_owner_id_edited_date
ON public.route
USING btree
(owner_id, edited_date DESC);
EXPLAIN ANALYZE SELECT
id
FROM
route
WHERE
owner_id = 39127
ORDER BY
edited_date DESC
LIMIT
5
"Limit (cost=0.43..16.99 rows=5 width=12) (actual time=0.028..0.050 rows=5 loops=1)"
" -> Index Scan using route_i_owner_id_edited_date on route (cost=0.43..15746.74 rows=4753 width=12) (actual time=0.025..0.039 rows=5 loops=1)"
" Index Cond: (owner_id = 39127)"
"Total runtime: 0.086 ms"
REINDEXon the table?route_i_edited_date.ORDER BY xxx LIMIT yyybehaves badly (it often does; the optimiser does not have enough freedom in the outer query) , a common trick is to rewrite torow_number() OVER (ORDER BY xxx) AS rn ... WHERE rn <= yyyin a subquery.