7
  • I have a very large table.
  • I have an index on columns col1.
  • I would like to get the data ordered by col1.
  • From the query plan I can say it doesn't use the index.
  • When I add "LIMIT", it starts to use the index
  • For large value for "LIMIT", it stops using the index.

Any clue?

P.S. I would like to get the data clustered by values of col1 (not necessarily sorted), any suggestions other than "ORDER BY".

THANKS !!

1
  • 2
    Where is the result from EXPLAIN ANALYZE? Without it, nobody has a clue why things aren't working as you would think. Commented Nov 19, 2010 at 19:57

2 Answers 2

8

If your return all rows from the table an index scan will be slower than a table scan. Why do you think you need the index usage?

You might try to use

set enable_seqscan = false

in order to disable the sequential scan, but I'm sure that will be slower than with the sequential scan.

ORDER BY is the only method to sort your data. Any other sorting you might see is pure coincidence

Edit
To clear things up: I do not recommend to turn seq scan off. I just posted this as a way to show that the seq scan is indeed faster than the index scan. Once turned off the execution plan using the index scan will most probably be slower than the seq scan showing the OP that there is no need for an index scan.

Sign up to request clarification or add additional context in comments.

5 Comments

Let's assume this is a key->value table. I would like to return all the rows but records having the same key should be consecutive in the result set, but the sorting doesn't really matter.
Then add an "ORDER BY the_key_column" clause. It's the only reliable way to sort your data
Very bad idea to turn enable_seqscan off, it doesn't solve the real problem.
@Frank: I have specifically written that it will be slower. I just wanted to show the OP a method to verify that the table scan is indeed faster that the seq scan.
Using index has theoretically benefit of streaming data immediately and without memory overhead instead of waiting till sorting is computed and stored before being sent to client.
4

In addition to the answer of a_horse_with_no_name:

Using an index is actually two distinct operations: First the value you desire is looked for in the index. In the index is the address of the complete record which gets then dereferenced. Both operations are very fast for specific queries.

If you intend to use all or most records anyway, the benefit goes away. If you want all records and you go through the index, it takes longer because for every record there are two seeks. It's easier to just run over the whole table without the index as this takes one seek per column (yes, I know, actually it's less than that because whole blocks are read etc... I just want to keep it simple).

4 Comments

But using the index saves the sorting time, I suppose.
No and the this whole explanation is about why not.
Also read the "Indexes and ORDER BY" section of the manual for a similar explanation.
I've no idea how Postgres sorts data in memory, but for sequentially reading all/most data from the table I would imagine doing a treesort on-the-fly as the data is read in would be far quicker than the extra level of indirection and cache abuse experienced if using an index with the desired ordering.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.