PostgreSql Select query performance issue

Question

I have a simple select query:

SELECT * FROM entities WHERE entity_type_id = 1 ORDER BY entity_id

Then I want to get the first 100 results, so I use this:

SELECT * FROM entities WHERE entity_type_id = 1 ORDER BY entity_id LIMIT 100

The problem is that the second query works much slower then the first one. It takes less than a second to execute the first query and more than a minute to execute the second one.

These are execution plans for the queries:

without limit:

Sort  (cost=26201.43..26231.42 rows=11994 width=72)
  Sort Key: entity_id
  ->  Index Scan using entity_type_id_idx on entities  (cost=0.00..24895.34 rows=11994 width=72)
        Index Cond: (entity_type_id = 1)

with limit:

Limit  (cost=0.00..8134.39 rows=100 width=72)
  ->  Index Scan using xpkentities on entities  (cost=0.00..975638.85 rows=11994 width=72)
        Filter: (entity_type_id = 1)

I don't understand why these two plans are so different and why the performance decreases so much. How should I tweak the second query to make it work faster?

I use PostgreSql 9.2.

entity_id is a primary key and there's an index on entity_type_id — hank
– hank, Commented Nov 13, 2013 at 7:50
Then you don't need order by as it is already sorted by entity_id ascending as default. — Kuzgun
– Kuzgun, Commented Nov 13, 2013 at 7:52
@Kuzgun - don't be ridiculous. Nothing is sorted until you specify an "order by". — Richard Huxton
– Richard Huxton, Commented Nov 13, 2013 at 14:12

Richard Huxton · Accepted Answer · 2013-11-13 14:22:09Z

1

You want the 100 smallest entity_id's matching your condition. Now - if those were numbers 1..100 then clearly using the entity_id index is the best way to handle this - everything is pre-sorted. In fact, if the 100 you wanted were in the range 1..200 then it still makes sense. Probably 1..1000 would.

So - PostgreSQL thinks it will find lots of entity_type_id=1 values at the "start" of the table. It estimates a cost of 8134 vs 26231 to filter by type then sort. In your case it is wrong.

Now - either there is some correlation which isn't obvious (that's bad - we can't tell the planner about that at present), or we don't have up-to-date or sufficient stats.

Does an ANALYZE entities make any difference? You can see what values the planner knows about by reading the planner-stats page in the manuals.

answered Nov 13, 2013 at 14:22

Richard Huxton

23.6k5 gold badges43 silver badges53 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

PostgreSql Select query performance issue

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related