0

I understand that when the majority of a table is estimated to be required in the result set for a given query, that a sequential scan may be preferred over using an index.

What I'm curious about is how postgres actually reads the pages into memory?

Does it organise them into some kind of ad-hoc in memory index whilst it reads them?

What if the table's too large to fit into memory?

Are there any high level papers on the topic?

(I've done some searching but results are full of blog posts explaining the basics of indexing, not the implementation details of a sequential scan. I expect it's not as straightforward as read into an array when evaluating a join condition over most of a table)

2
  • "Does it organise them into some kind of ad-hoc in memory index?" -- Nope, the whole point is to avoid the index and directly read the heap. Commented Jul 23, 2020 at 18:42
  • What I guess I mean is if you're joining on some random column X, does it have to iterate through each possible row multiple times to find the correct row for each value in the other table, or does it do something more advanced than that? Commented Jul 23, 2020 at 18:46

1 Answer 1

1

What I'm curious about is how postgres actually reads the pages into memory?

The engine reads the whole heap in any order while discarding rows marked as deleted. Hot blocks (already present in the cache) are much faster to process.

Does it organise them into some kind of ad-hoc in memory index whilst it reads them?

No, a sequential scan avoids indexes and reads the heap directly using buffering and the cache.

What if the table's too large to fit into memory?

A sequential scan is pipelined. This means I/O blocks are read as needed. The engine does not need to have the whole heap in memory before it starts processing it. It read a few blocks, then process them and discards them; then it does this again and again until it reads all the blocks of the heap.

Are there any high level papers on the topic?

There should be but, anyway, any good book on query optimization will describe this process in detail.

EDIT For Your Second Question:

What I guess I mean is if you're joining on some random column X, does it have to iterate through each possible row multiple times to find the correct row for each value in the other table, or does it do something more advanced than that?

Well, when you join a couple of tables (or more) the engine query planner produces a plan that includes a "Nested Loop", a "Hash Join", or a "Merge Join" operator. There are more operators but these are the common ones.

  • The Nested Loop Join retrieves rows for the linked table that match the first one. It could perform an index seek or scan on the related table (ideal) or a full table scan (not ideal).

  • The Hash Join hashes the secondary table first (incurring in high startup cost) and then joins fast.

  • The Merge Join sorts both tables by the join key (assuming an equi-join), again incurring in heavy startup cost) and then joins fast (like a zipper).

Sign up to request clarification or add additional context in comments.

3 Comments

Thanks, that makes sense - with a simple filter like x > 2019 it would do a single pass over the table, with a more complex condition like a join it will pre-process to a structure better suited to random access to avoid multiple scans. As a bonus if you had any particular book recommendations please fire away
For single table queries I would recommend "SQL Performance Explained" from Markus Winand. For multi table queries, the subject is much more arid and complex. But you get a lot of bang for the buck with the this book. NOTE: this book doesn't focus in PostgreSQL specifically, but compares Oracle, PostgreSQL, SQL Server, and MySQL.
Yeah, I've actually read his online version of that book in the past and gotten a lot of value out of it. Feel a little silly asking the question now as it's stuff I should have already known - but thank you very much for your answer, it's straightened things out in my mind again

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.