3

Below I have 2 almost identical queries, only the limit is different. Nevertheless, the query plan and execution time are totally different. The first query is +300 times slower than the second one.

The problem only occurs for a small number of owner_ids. Owners with many routes (+1000), none of which has recently been edited. The table route contains 2,806,976 rows. The owner in the example has 4,510 routes.

The database is hosted on Amazon RDS on a server with 34.2 GiB memory, 4vCPU and provisioned IOPS (instance type db.m2.2xlarge).

EXPLAIN ANALYZE SELECT
    id
FROM
    route
WHERE
    owner_id = 39127
ORDER BY
    edited_date DESC
LIMIT
    5

Query plan:
"Limit  (cost=0.43..5648.85 rows=5 width=12) (actual time=1.046..12949.436 rows=5 loops=1)"
"  ->  Index Scan Backward using route_i_edited_date on route  (cost=0.43..5368257.28 rows=4752 width=12) (actual time=1.042..12949.418 rows=5 loops=1)"
"        Filter: (owner_id = 39127)"
"        Rows Removed by Filter: 2351712"
"Total runtime: 12949.483 ms"

EXPLAIN ANALYZE SELECT
    id
FROM
    route
WHERE
    owner_id = 39127
ORDER BY
    edited_date DESC
LIMIT
    15

Query plan:
"Limit  (cost=13198.79..13198.83 rows=15 width=12) (actual time=37.781..37.821 rows=15 loops=1)"
"  ->  Sort  (cost=13198.79..13210.67 rows=4752 width=12) (actual time=37.778..37.790 rows=15 loops=1)"
"        Sort Key: edited_date"
"        Sort Method: top-N heapsort  Memory: 25kB"
"        ->  Index Scan using route_i_owner_id on route  (cost=0.43..13082.20 rows=4752 width=12) (actual time=0.039..32.425 rows=4510 loops=1)"
"              Index Cond: (owner_id = 39127)"
"Total runtime: 37.870 ms"

How can I ensure that Postgres uses the index route_i_owner_id.

I already tried the following things:

  • increasing statistics for edited_date and owner_id

    ALTER TABLE route ALTER COLUMN owner_id SET STATISTICS 1000;
    ALTER TABLE route ALTER COLUMN edited_date SET STATISTICS 1000;
    
  • vacuum analyse of whole database

Solved with following composite index:

CREATE INDEX route_i_owner_id_edited_date
  ON public.route
  USING btree
  (owner_id, edited_date DESC);

EXPLAIN ANALYZE SELECT
    id
FROM
    route
WHERE
    owner_id = 39127
ORDER BY
    edited_date DESC
LIMIT
    5

"Limit  (cost=0.43..16.99 rows=5 width=12) (actual time=0.028..0.050 rows=5 loops=1)"
"  ->  Index Scan using route_i_owner_id_edited_date on route  (cost=0.43..15746.74 rows=4753 width=12) (actual time=0.025..0.039 rows=5 loops=1)"
"        Index Cond: (owner_id = 39127)"
"Total runtime: 0.086 ms"
8
  • Have you tried REINDEX on the table? Commented Feb 26, 2015 at 11:14
  • PostgreSQL should also use both indexes in combination if necessary: "Fortunately, PostgreSQL has the ability to combine multiple indexes (including multiple uses of the same index) to handle cases that cannot be implemented by single index scans." (from documentation). I would still say there's something wrong with the original index route_i_edited_date. Commented Feb 26, 2015 at 11:21
  • I tried the REINDEX without success. A composite index solved my case @ Simo Kivistö Commented Feb 26, 2015 at 11:22
  • If you post the index definitions, there might be another explanation to this behavior. As @SimoKivistö pointed out, postgres can use multiple indexes. It might be the reversing of the edited_date index that takes a long time, in combination with low memory limit for sort operations which might force it to use disk based sorting. Commented Feb 26, 2015 at 11:29
  • 1
    BTW: if ORDER BY xxx LIMIT yyy behaves badly (it often does; the optimiser does not have enough freedom in the outer query) , a common trick is to rewrite to row_number() OVER (ORDER BY xxx) AS rn ... WHERE rn <= yyy in a subquery. Commented Feb 26, 2015 at 11:36

1 Answer 1

1

This query is to slow to begin with. It should take less than 1s.

Your first example uses the edited_date index to sort the data first, then filter the sorted data.

Your second example, sorts the data (without index, it seems), then applies an index scan to fetch the actual rows. Both approaches seems bad.

What would probably speed it up, is a composite index of both owner_id and edited_date, which would make sense if this kind of query is used often. This index would also replace one of the other indexes, and perhaps even both.

Sign up to request clarification or add additional context in comments.

3 Comments

The second query is reasonably fast, 38 ms.
In the composite index, what should I put first? Owner_id or edited_date?
@pieter the column you put first, would be the most efficient for a single column query, so if you often query on owner_id alone, this should be first. If you often sort only and limit (like, order by edited_date limit 5 without any where clause), it might make sense to put that first, or create a second index with only edited_date. You need to test the queries and see how much of a difference it makes.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.