1

Background story :

I'm working on a query with a lot of left joins (doctrine joined inheritance on an old project).

The query is run against a 9.4 postgres & its structure (lightened) looks like this :

SELECT *
FROM table1 a0_ 
LEFT JOIN table2 a1_ ON a0_.id = a1_.id 
LEFT JOIN table3 a2_ ON a0_.id = a2_.id 
WHERE a0_.created_at <= '2019-11-25 15:09:33' LIMIT 5

This query is slowed down by the limit (on the original huge query) because, from my understanding, it first joins on all tables before performing the limit. I already figured out a way to fix this behavior by moving where & limit in a subquery acting as source table, thus reducing original data pool (improving performance by around 4).

Problem faced :

In order to fully understand what happened behind, I analyzed the original query, which outputs this (you can also get a beautified view here) :

Limit  (cost=0.42..24.49 rows=1 width=10536) (actual time=0.129..0.546 rows=5 loops=1)
  ->  Nested Loop Left Join  (cost=0.42..24.49 rows=1 width=10536) (actual time=0.113..0.462 rows=5 loops=1)
        ->  Nested Loop Left Join  (cost=0.27..16.31 rows=1 width=9976) (actual time=0.071..0.285 rows=5 loops=1)
              ->  Index Scan using table1_pkey on table1 a0_  (cost=0.12..8.14 rows=1 width=9428) (actual time=0.022..0.063 rows=5 loops=1)
                    Filter: (created_at <= '2019-11-25 15:09:33'::timestamp without time zone)
              ->  Index Scan using table2_pkey on table2 a1_  (cost=0.14..8.16 rows=1 width=548) (actual time=0.012..0.015 rows=1 loops=5)
                    Index Cond: ((a0_.id)::text = (id)::text)
        ->  Index Scan using table3_pkey on table3 a2_  (cost=0.14..8.16 rows=1 width=560) (actual time=0.010..0.010 rows=0 loops=5)
              Index Cond: ((a0_.id)::text = (id)::text)

On the new optimized query (with where & limit in subquery), the explain shows the limit being made right after the where filter.

What I don't understand on this original query's explain is that despite limit being "runned" last, it seems to already be active on the first statement run which only outputs 5 rows (against 8 without limit).

Could someone explain why ?

1
  • 2
    The execution plan is run as a DAG (directed acyclic graph). It is not a synchronizes process where one step has to complete before another. The rows can filter out through all the nodes. When 5 is hit, then the lower levels do not need to do any more work. Commented Nov 26, 2019 at 16:49

1 Answer 1

1

PostgreSQL's executor works "on demand". When the top node of the execution plan needs to output another row, it will request more rows from the nodes below it.

This propagates down to the lower nodes. Processing stops when the "Limit" node has reached its limit, so the lower nodes need only produce a fraction of the rows they could produce.

Let me explain step by step what happens when the execution plan produces the first result row:

  • To produce the first result row, the Limit node has to get one result from the outer Nested Loop Left Join.

  • To produce the first row, that outer nested loop has to get the first result from the inner Nested Loop Left Join.

  • The inner nested loop join first has to fetch the first row from the index scan on table1. Then it fetches one matching row (if any) from the index scan on table2 (this is the first loop). These two rows get joined to produce the first result row of the inner Nested Loop Left Join.

  • Now the outer Nested Loop Left Join fetches the first matching row (if any) from the index scan on table3. It finds few (the average rows=0). This is loop number 1. Even if it finds no matching row, it will produce a result row, since this is an outer join.

Repeat the above five times until the Limit is done, and you will end up with the counts you see in the execution plan.

Sign up to request clarification or add additional context in comments.

4 Comments

The top node being the most indented one, right ? Since its the first to be executed. If so, the limit node might never be "reached", no ?
No, the top node is the one on top, the Limit.
What confuses me is that from my understanding, the plan is supposed to be read from the inside out, so the limit would be last to be "executed", yet you're telling me it's the top node, which impacts the lower nodes. It probably has something to do with the asynchronous process @gordon-linoff was talking about but i'm not totally grasping it
The "inside" nodes will produce a row first, but execution is triggered top down. I tried to add a detailed explanation of the processing in my extended answer.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.