Why is a LATERAL JOIN faster than a correlated subquery in Postgres?

Question

I have rewritten the below query from using a correlated subquery to using a LATERAL JOIN. It is now faster, and I would like to understand why?

This SO answer says that there is no golden rule and that it really depends on the situation. However, I was wondering if there is some sort of intuition to this?

In case it is relevant, this query is used as a VIEW.

The old query with a correlated subquery (well technically 2):

SELECT
  ts.id,
  ts.val1,
  tv.val2
FROM table_static AS ts
JOIN table_version AS tv ON ts.id = tv.id
  AND tv.effective_time = (
    SELECT MAX(tv1.effective_time)
    FROM table_version AS tv1
    WHERE
      tv1.id = ts.id
      AND
      tv1.effective_time <= CLOCK_TIMESTAMP()
  )
  AND tv.create_time = (
    SELECT MAX(tv2.create_time)
    FROM table_version AS tv2
    WHERE
      tv2.id = tv.id
      AND
      tv2.effective_time = tv.effective_time
      AND
      tv2.create_time <= CLOCK_TIMESTAMP()
  )
JOIN table_status AS t_status ON tv.status_id = t_status.id
WHERE t_status.status != 'deleted'
LIMIT 1000;

Query plan (correlated subquery):

Limit  (cost=4.96..13876200.85 rows=141 width=64) (actual time=0.078..10.788 rows=1000 loops=1)
  ->  Nested Loop  (cost=4.96..13876200.85 rows=141 width=64) (actual time=0.077..10.641 rows=1000 loops=1)
        Join Filter: (tv.status_id = t_status.id)
        ->  Nested Loop  (cost=4.96..13876190.47 rows=176 width=64) (actual time=0.065..10.169 rows=1000 loops=1)
              ->  Seq Scan on table_static ts  (cost=0.00..17353.01 rows=1000001 width=32) (actual time=0.010..0.176 rows=1000 loops=1)
              ->  Index Scan using table_version_pkey on table_version tv  (cost=4.96..13.85 rows=1 width=40) (actual time=0.005..0.006 rows=1 loops=1000)
                    Index Cond: ((id = ts.id) AND (effective_time = (SubPlan 2)))
                    Filter: (create_time = (SubPlan 4))
                    SubPlan 4
                      ->  Result  (cost=8.46..8.47 rows=1 width=8) (actual time=0.003..0.003 rows=1 loops=1000)
                            InitPlan 3 (returns $4)
                              ->  Limit  (cost=0.43..8.46 rows=1 width=8) (actual time=0.002..0.002 rows=1 loops=1000)
                                    ->  Index Only Scan Backward using table_version_pkey on table_version tv2  (cost=0.43..8.46 rows=1 width=8) (actual time=0.002..0.002 rows=1 loops=1000)
                                          Index Cond: ((id = tv.id) AND (effective_time = tv.effective_time) AND (create_time IS NOT NULL))
                                          Filter: (create_time <= clock_timestamp())
                                          Heap Fetches: 0
                    SubPlan 2
                      ->  Result  (cost=4.52..4.53 rows=1 width=8) (actual time=0.003..0.003 rows=1 loops=1000)
                            InitPlan 1 (returns $1)
                              ->  Limit  (cost=0.43..4.52 rows=1 width=8) (actual time=0.003..0.003 rows=1 loops=1000)
                                    ->  Index Only Scan Backward using table_version_pkey on table_version tv1  (cost=0.43..8.61 rows=2 width=8) (actual time=0.002..0.002 rows=1 loops=1000)
                                          Index Cond: ((id = ts.id) AND (effective_time IS NOT NULL))
                                          Filter: (effective_time <= clock_timestamp())
                                          Heap Fetches: 0
        ->  Materialize  (cost=0.00..1.08 rows=4 width=8) (actual time=0.000..0.000 rows=1 loops=1000)
              ->  Seq Scan on table_status t_status  (cost=0.00..1.06 rows=4 width=8) (actual time=0.006..0.006 rows=1 loops=1)
                    Filter: (status <> 'deleted'::text)
Planning Time: 0.827 ms
Execution Time: 10.936 ms

The new query with a LATERAL JOIN:

SELECT
  ts.id,
  ts.val1,
  tv.val2
FROM table_static AS ts
JOIN LATERAL (
  SELECT *
  FROM table_version AS tv
  WHERE
    ts.id = tv.id
    AND
    tv.effective_time <= CLOCK_TIMESTAMP()
    AND
    tv.create_time <= CLOCK_TIMESTAMP()
  ORDER BY
    tv.effective_time DESC,
    tv.create_time DESC
  LIMIT 1
) AS tv ON TRUE
JOIN table_status AS t_status ON tv.status_id = t_status.id
WHERE t_status.status != 'deleted'
LIMIT 1000;

Query plan (LATERAL JOIN):

Limit  (cost=0.43..40694.36 rows=1000 width=64) (actual time=0.218..4.431 rows=1000 loops=1)
  ->  Nested Loop  (cost=0.43..32555183.83 rows=800001 width=64) (actual time=0.217..4.280 rows=1000 loops=1)
        Join Filter: (tv.status_id = t_status.id)
        ->  Nested Loop  (cost=0.43..32502382.70 rows=1000001 width=64) (actual time=0.189..3.815 rows=1000 loops=1)
              ->  Seq Scan on table_static ts  (cost=0.00..17353.01 rows=1000001 width=32) (actual time=0.059..0.297 rows=1000 loops=1)
              ->  Limit  (cost=0.43..32.46 rows=1 width=48) (actual time=0.003..0.003 rows=1 loops=1000)
                    ->  Index Scan Backward using table_version_pkey on table_version tv  (cost=0.43..32.46 rows=1 width=48) (actual time=0.003..0.003 rows=1 loops=1000)
                          Index Cond: (id = ts.id)
                          Filter: ((effective_time <= clock_timestamp()) AND (create_time <= clock_timestamp()))
        ->  Materialize  (cost=0.00..1.08 rows=4 width=8) (actual time=0.000..0.000 rows=1 loops=1000)
              ->  Seq Scan on table_status t_status  (cost=0.00..1.06 rows=4 width=8) (actual time=0.021..0.021 rows=1 loops=1)
                    Filter: (status <> 'deleted'::text)
Planning Time: 1.315 ms
Execution Time: 4.746 ms

For both cases the following indices (primary keys) exist:

ALTER TABLE ONLY table_static
    ADD CONSTRAINT table_static_pkey PRIMARY KEY (id);
ALTER TABLE ONLY table_version
    ADD CONSTRAINT table_version_pkey PRIMARY KEY (id, effective_time, create_time);
ALTER TABLE ONLY table_status
    ADD CONSTRAINT table_status_pkey PRIMARY KEY (id);

Maybe the answer is simply because there is one less "subquery"? To my understanding, the indices can be used by both queries.

If there are any other ways I might optimize this query, I would love to hear them.

Maybe it's more about how the LATERAL JOIN changes the execution plan and allows PostgreSQL to make better use of indexes and reduce the overall execution complexity. dba.stackexchange.com/questions/237219/… — user9517769
– user9517769, Commented Feb 18, 2024 at 19:51
@AmiraBedhiafi Thanks for your comment! The answer to the question you shared says: "The difference in plan between the two is unlikely to have any practical significance.". While here in my example, there clearly seems to be a practical significance. From what I'm reading online, it seems like most people are of the opinion that it shouldn't matter that much. Hence my question :) — Marnix.hoh
– Marnix.hoh, Commented Feb 18, 2024 at 19:58
I can say that maybe there's no one-size-fits-all rule for when to use correlated subqueries versus LATERAL JOINs, your experience illustrates that LATERAL JOINs can offer significant performance benefits in SOME situations where they allow the query planner to optimize the execution plan more effectively. — user9517769
– user9517769, Commented Feb 18, 2024 at 20:08
Your reference is to a MySQL answer. That's only loosely relevant for Postgres performance optimization. Both query engines are very different beasts. Elephants and dolphins don't respond to the same incentives. — Erwin Brandstetter
– Erwin Brandstetter, Commented Feb 18, 2024 at 20:50

Erwin Brandstetter · Accepted Answer · 2024-02-19 08:04:31Z

5

For simple cases, a correlated subquery is typically slightly faster than a LATERAL subquery. LATERAL subqueries are just more versatile.

JOIN LATERAL ... ON true is most probably not what you want. It's not equivalent to a correlated subquery in any case - which is LEFT JOIN LATERAL ... ON true. Your version is a more verbose equivalent of CROSS JOIN LATERAL, which eliminates result rows where the right side produces no row. Your two queries are not logically equivalent. Retest with equivalent queries. See:

What is the difference between a LATERAL JOIN and a subquery in PostgreSQL?

And why clock_timestamp()? Most probably wrong. We'd need to see the exact table definition and your rationale for that. clock_timestamp() changes during statement execution, and can make queries much more expensive - besides being wrong most of the time. See:

Difference between now() and current_timestamp

Finally, your "correlated" version really uses two distinct correlated subqueries. The second one doubles the work without need, making it more expensive. Use a single correlated subquery instead.

Apart from that, your index table_version_pkey is pretty perfect for the given query, and both LATERAL and correlated will be very fast.

Assuming there is a proper 1:1 relationship between table_static and table_status, and a 1:n relationship between table_static and table_status. Compare these:

-- correlated
SELECT ts.id, ts.val1
    , (SELECT tv.val2
       FROM   table_version tv
       WHERE  ts.id = tv.id
       AND    tv.effective_time <= now()
       AND    tv.create_time    <= now()
       ORDER  BY tv.effective_time DESC, tv.create_time DESC
       LIMIT  1
      ) AS val2
FROM   table_static ts
JOIN   table_status t_status ON tv.status_id = t_status.id
WHERE  t_status.status <> 'deleted'
LIMIT  1000;

-- lateral
SELECT ts.id, ts.val1, tv.val2
FROM   table_static ts
JOIN   table_status t_status ON tv.status_id = t_status.id
LEFT   JOIN LATERAL (
   SELECT tv.val2
   FROM   table_version tv
   WHERE  ts.id = tv.id
   AND    tv.effective_time <= now()
   AND    tv.create_time    <= now()
   ORDER  BY tv.effective_time DESC, tv.create_time DESC
   LIMIT  1
   ) tv ON true
WHERE  t_status.status <> 'deleted'
LIMIT  1000;

Both should be very fast, a bit faster than your previous best test candidate, the correlated version the slightly faster one.

One more issue: LIMIT 1000. Actually, two issues with that:

Typically you display one page of rows, that would be 10, 20, 50, 100? But not 1000. Maybe you need just 1000, then disregard this point. You wrapped that into a VIEW. If you then select rows from that view with another (smaller) LIMIT, you get a sub-optimal query plan. Use the actual LIMIT (and OFFSET) you need, in a customized query. Remove the VIEW from the food chain, or remove the LIMIT from the VIEW. Postgres can typically find a different (faster) query plan for LIMIT 10. If you want paging, look into "keyset pagination":
- Optimize query with OFFSET on large table
LIMIT without ORDER BY produces arbitrary results. Results can differ between two executions, even without changing anything else. Typically, you want deterministic results, defined by given criteria. Then optimize query (and indexes) for that. Recent related answer:
- Postgres row_number return arbitrary sort order

edited Feb 19, 2024 at 8:04

answered Feb 18, 2024 at 20:14

Erwin Brandstetter

668k159 gold badges1.2k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

7 Comments

Marnix.hoh Over a year ago

Thank you so much for your detailed answer! To answer some of your points: clock_timestamp() is used on purpose, so that table_version rows that are inserted during a transaction will be visible. With now(), they wouldn't be. However, I was unaware of the fact that now() is faster. So I will test this and reevaluate the above requirement if the difference is significant.

Erwin Brandstetter Over a year ago

*"... so that table_version rows that are inserted during a transaction will be visible." That doesn't work to begin with. In default READ COMMITTED transaction isolation, every command sees what has been committed before it started. The use of clock_timestamp() doesn't buy you anything there. But it can (theoretically) make query results inconsistent, and it can make queries more expensive, because we are dealing with potentially many different values now, disabling certain optimization. Use CURRENT_TIMESTAMP or LOCALTIMESTAMP instead.

Erwin Brandstetter Over a year ago

You should have included exact table definitions and cardinalities to begin with, to save time. Either way, your "lateral" version presents the same logic: consider only the (one!) latest row from table_version per table_static.id. If there is always at least one, the CROSS JOIN won't do harm. OTOH, there is no good reason to introduce a point of failure without need.

Erwin Brandstetter Over a year ago

I did miss tv2.effective_time = tv.effective_time in your 2nd corr. sub. So that's not wrong, just very expensive. Fixed my answer there. My suggested queries still stand.

Erwin Brandstetter Over a year ago

"the LIMIT 1000, is in there just for testing purposes (the test db has ~1,000,000 rows)" Note that Postgres optimizes the query plan for the given LIMIT. Be sure to test the query without LIMIT, too, if that's what you ultimately need.

|

Charlieface · Accepted Answer · 2024-02-18 20:11:37Z

1

You could force the LIMIT to happen first:

SELECT
  ts.id,
  ts.val1,
  tv.val2
FROM (
    SELECT ts.*
    FROM table_static AS ts
    LIMIT 1000  -- LIMIT without ORDER BY is non-deterministic
) ts
JOIN LATERAL (
  SELECT *
  FROM table_version AS tv
  WHERE
    ts.id = tv.id
    AND
    tv.effective_time <= CLOCK_TIMESTAMP()
    AND
    tv.create_time <= CLOCK_TIMESTAMP()
  ORDER BY
    tv.effective_time DESC,
    tv.create_time DESC
  LIMIT 1
) AS tv ON TRUE
JOIN table_status AS t_status ON tv.status_id = t_status.id
WHERE t_status.status != 'deleted';

An even better version might be to use window functions.

SELECT
  ts.id,
  ts.val1,
  tv.val2
FROM table_static AS ts
LEFT JOIN (
  SELECT *,
    ROW_NUMBER() OVER (PARTITION BY tv.id ORDER BY tv.effective_time DESC, tv.create_time DESC) AS rn
  FROM table_version AS tv
  WHERE
    tv.effective_time <= CLOCK_TIMESTAMP()
    AND
    tv.create_time <= CLOCK_TIMESTAMP()
) AS tv
   ON tv.id = ts.id
  AND tv.rn = 1
JOIN table_status AS t_status ON tv.status_id = t_status.id
WHERE t_status.status != 'deleted'
LIMIT 1000;  -- LIMIT without ORDER BY is non-deterministic

answered Feb 18, 2024 at 20:11

Charlieface

78.4k8 gold badges35 silver badges77 bronze badges

1 Comment

Marnix.hoh Over a year ago

Thank you for your answer! The LIMIT 1000 is in there simply because of testing the queries against a ~1,000,000 row db. I tried your "window function" query, but it is unfortunately significantly slower than the LATERAL query in my post. The LATERAL query takes about ~4ms and the "window function" query takes about ~17s.

Collectives™ on Stack Overflow

Why is a LATERAL JOIN faster than a correlated subquery in Postgres?

2 Answers 2

7 Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

7 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related