How PostgreSQL execute query?

Question

Can anyone explain why PostgreSQL works so:

If I execute this query

SELECT
*
FROM project_archive_doc as PAD, project_archive_doc as PAD2
WHERE
PAD.id = PAD2.id

it will be simple JOIN and EXPLAIN will looks like this:

Hash Join  (cost=6.85..13.91 rows=171 width=150)
  Hash Cond: (pad.id = pad2.id)
  ->  Seq Scan on project_archive_doc pad  (cost=0.00..4.71 rows=171 width=75)
  ->  Hash  (cost=4.71..4.71 rows=171 width=75)
        ->  Seq Scan on project_archive_doc pad2  (cost=0.00..4.71 rows=171 width=75)

But if I will execute this query:

SELECT *
FROM project_archive_doc as PAD
WHERE
PAD.id = (
          SELECT PAD2.id
          FROM project_archive_doc as PAD2
          WHERE
          PAD2.project_id = PAD.project_id
          ORDER BY PAD2.created_at
          LIMIT 1)

there will be no joins and EXPLAIN looks like:

Seq Scan on project_archive_doc pad  (cost=0.00..886.22 rows=1 width=75)"
  Filter: (id = (SubPlan 1))
  SubPlan 1
    ->  Limit  (cost=5.15..5.15 rows=1 width=8)
          ->  Sort  (cost=5.15..5.15 rows=1 width=8)
                Sort Key: pad2.created_at
                ->  Seq Scan on project_archive_doc pad2  (cost=0.00..5.14 rows=1 width=8)
                      Filter: (project_id = pad.project_id)

Why it is so and is there any documentation or articles about this?

The additional where clause PAD2.project_id = PAD.project_id PLUS the LIMIT 1 make the optimiser believe that very few rows will satisfy the condition (estimate=one row), so it chooses to extract exaxtly this row. Things may change if the table gets larger. That is: the plan could change, the result will always be one row. The optimiser is (mostly) always right... — joop
– joop, Commented Apr 22, 2014 at 9:29
@joop, "the result will always be one row" do you talk only about subquery? Because the entire query returns more than one row. — Denis Nikanorov
– Denis Nikanorov, Commented Apr 22, 2014 at 10:17
@DenisNikanorov These queries are totally different. Why are you comparing them? — Ihor Romanchenko
– Ihor Romanchenko, Commented Apr 22, 2014 at 10:52
Without knowing the table structure {id, project_id} seem to be elements of keys) I cannot answer the question. In short: the queries are different, so a different plans can be expected. Also: what is the purpose of these attempts ? (LIMIT 1 inside a (scalar correlated) subquery looks awkward, IMHO) — joop
– joop, Commented Apr 22, 2014 at 10:52

Craig Ringer · Accepted Answer · 2014-04-22 13:24:17Z

Without table definitions and data it's hard to be specific for this case. In general, PostgreSQL is like most SQL databases in that it doesn't treat SQL as a step-by-step program for how to execute a query. It's more like a description of what you want the results to be and a hint about how you want the database to produce those results.

PostgreSQL is free to actually execute the query however it can most efficiently do so, so long as it produces the results you want.

Often it has several choices about how to produce a particular result. It will choose between them based on cost estimates.

It can also "understand" that several different ways of writing a particular query are equivalent, and transform one into another where it's more efficient. For example, it can transform an IN (SELECT ...) into a join, because it can prove they're equivalent.

However, sometimes apparently small changes to a query fundamentally change its meaning, and limit what optimisations/transformations PostgreSQL can make. Adding a LIMIT or OFFSET inside a subquery prevents PostgreSQL from flattening it, i.e. combining it with the outer query by tranforming it into a join. It also prevents PostgreSQL from moving WHERE clause entries between the subquery and outer query, because that'd change the meaning of the query. Without a LIMIT or OFFSET clause, it can do both these things because they don't change the query's meaning.

There's some info on the planner here.

Collectives™ on Stack Overflow

How PostgreSQL execute query?

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related