0

I am working on this simple example:

=> create table t1 ( a int, b int, c int );
CREATE TABLE

=> insert into t1 select a, a, a from generate_series(1,100) a;
INSERT 0 100

=> create index i1 on t1(b);
CREATE INDEX

=> vacuum t1;
VACUUM

=> explain analyze select b from t1 where b = 10;
                                         QUERY PLAN
--------------------------------------------------------------------------------------------
 Seq Scan on t1  (cost=0.00..2.25 rows=1 width=4) (actual time=0.016..0.035 rows=1 loops=1)
   Filter: (b = 10)
   Rows Removed by Filter: 99
 Planning Time: 0.082 ms
 Execution Time: 0.051 ms
(5 rows)

You can see that I select b and query on b only. And also vacuum t1; manually to make sure the Visibility information is stored in the index.

But why does Postgresql still do Seq Scan instead of index-only-scan?

Edited

After adding more rows, it will do index-only-scan:

=> insert into t1 select a, a, a from generate_series(1,2000) a;

=> vacuum t1;

=> explain analyze select b from t1 where b = 10;
                                                 QUERY PLAN
-------------------------------------------------------------------------------------------------------------
 Index Only Scan using i1 on t1  (cost=0.28..4.45 rows=10 width=4) (actual time=0.038..0.039 rows=1 loops=1)
   Index Cond: (b = 10)
   Heap Fetches: 0
 Planning Time: 0.186 ms
 Execution Time: 0.058 ms
(5 rows)

It seems like PostgreSQL doesn't like index-only-scan when the rows number is small.

6
  • 1
    Is this something to do with the size of data, that it's easier to page that into memory and scan? I recall reading something about this years ago. Does the query change with 1,000,000 rows and there are more columns than ints Commented Jun 11, 2019 at 20:00
  • @MâttFrëëman yeah, you're right. I just updated the question at the same time your comment. It does do index-only-scan after adding more rows. ^_^ Thanks. Commented Jun 11, 2019 at 20:01
  • Knowledge is very hazy... But I recall this is something to do with table statistics too (or maybe that was on another rdbms) so query planner needs to have a rough idea of them rather than say doing a count(*) or size(table) internally, perhaps your vacuum triggered the statistics though.. Commented Jun 11, 2019 at 20:03
  • @MâttFrëëman yeah, thanks a lot for the tip. I have upvoted your first comment. :) Commented Jun 11, 2019 at 20:05
  • 3
    100 rows will fit on a single data block, so doing a seq scan will only require a single I/O operation and the index only scan would require the same. Use explain (analyze, buffers) to see more details on the blocks (=buffers) needed by the query Commented Jun 12, 2019 at 0:12

1 Answer 1

1

Since nobody want to provide a detail explanation, I will write a simple answer here.

From @a_horse_with_no_name:

100 rows will fit on a single data block, so doing a seq scan will only require a single I/O operation and the index only scan would require the same. Use explain (analyze, buffers) to see more details on the blocks (=buffers) needed by the query

From https://www.postgresql.org/docs/current/indexes-examine.html:

It is especially fatal to use very small test data sets. While selecting 1000 out of 100000 rows could be a candidate for an index, selecting 1 out of 100 rows will hardly be, because the 100 rows probably fit within a single disk page, and there is no plan that can beat sequentially fetching 1 disk page.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.