11

I am using Postgres database , I am trying to see the difference between Index Scan and Sequential scan on table of 1000000 rows

Describe table

\d grades 

enter image description here

Then explain analyze for rows between 10 and 500000

explain analyze select name from grades where pid between 10 and 500000 ; 

enter image description here

Then explain analyze for rows between 10 and 600000

explain analyze select name from grades where pid between 10 and 600000 ;

enter image description here

The strange for me why it made Index scan on first query and sequential scan in the second although they query by the same column which it contained in the index .

3 Answers 3

33

If you need only a single table row, an index scan is much faster than a sequential scan. If you need the whole table, a sequential scan is faster than an index scan.
Somewhere between that is the turning point where PostgreSQL switches between these two access methods.

You can tune random_page_cost to influence the point where a sequential scan is chosen. If you have SSD storage, you should set the parameter to 1.0 or 1.1 to tell PostgreSQL that index scans are cheaper on your hardware.

Sign up to request clarification or add additional context in comments.

1 Comment

Setting the random_page_cost=1.1 worked for me - the planner switched from a 300+ms sequential scan to a <1ms index scan.
6

PostgreSQL uses a cost based optimizer, not a rule based optimizer. If you take the estimated cost of the index scan, 18693, and scale it up linearly by the ratio of the expected rows between the two plans (which is not exactly what the planner does, but should be a good enough first approximation) you get 22330. That is higher than the expected cost of the seq scan, 21372, so it chooses the seq scan.

If you scale the index-scan actual time up the same way, you get 89ms, which is slightly faster than the seq scan actually was. So maybe the planner made a very slight error here, but it is certainly nothing to worry about in practice.

If the difference in run times were a factor of 10, rather than 10%, that might be worth investigating further.

Comments

1

its because If the SELECT returns more than approximately 5-10% of all rows in the table, a sequential scan is much faster than an index scan. and your second query hit that threshold; because you are fetching more rows

1 Comment

Please @eshirvana at first query retrieved approximately 50% not 10% from the rows and explained me index scan

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.