Postgres query performance is damaged when adding a simple OR clause

Question

An app that I'm working with runs a query like this:

SELECT
   "categories"."id" 
FROM
   "categories" 
WHERE
(
      ('f' ) 
      OR categories.id NOT IN 
      (
         SELECT
            category_id 
         FROM
            category_groups
      )
);

This query takes an extremely long time to finish, and I haven't seen it finish yet. Our "categories" table has 65947 rows and our "category_groups" table has 131,780 rows.

When running "EXPLAIN" on this query, it says that this query will "cost" a ton of work (cost=1000.29..109033834.49).

I edited this query to remove that ('f') OR section of the clause, as shown in the below query:

SELECT
   "categories"."id" 
FROM
   "categories" 
WHERE
(
      categories.id NOT IN 
      (
         SELECT
            category_id 
         FROM
            category_groups
      )
);

This query finishes very quickly, and it's "cost" when EXPLAIN-ing it is much lower (6283.94..10190.09).

Similarly, if I replace the ('f') OR with ('t') OR, the query completes quickly, and the cost goes down (back to 6283.94..10190.09).

Why would adding that ('f') OR clause damage this query's performance so much?

Edit: Here's the full EXPLAIN (VERBOSE) for the query with ('f') OR

 Gather  (cost=1000.29..109033834.49 rows=32952 width=4)
   Output: categories.id
   Workers Planned: 1
   ->  Parallel Index Only Scan using categories_pkey on public.categories  (cost=0.29..109029539.29 rows=19384 width=4)
         Output: categories.id
         Filter: (NOT (SubPlan 1))
         SubPlan 1
           ->  Materialize  (cost=0.00..5295.43 rows=131762 width=4)
                 Output: category_groups.category_id
                 ->  Seq Scan on public.category_groups  (cost=0.00..4121.62 rows=131762 width=4)
                       Output: category_groups.category_id

And here's the full explain for the query without ('f') OR:

 Hash Anti Join  (cost=6283.94..10190.09 rows=1131 width=4)
   Output: categories.id
   Hash Cond: (categories.id = category_groups.category_id)
   ->  Index Only Scan using categories_pkey on public.categories  (cost=0.29..2213.44 rows=65903 width=4)
         Output: categories.id
   ->  Hash  (cost=4121.62..4121.62 rows=131762 width=4)
         Output: category_groups.category_id
         ->  Seq Scan on public.category_groups  (cost=0.00..4121.62 rows=131762 width=4)
               Output: category_groups.category_id

What's the intention behind writing where false or ...? But in general, OR conditions are hard to optimize (for any DBMS, not just Postgres) — user330315
– user330315, Commented Mar 10, 2022 at 20:28
Can we see the full explain, please? What is the purpose of the ('f' ) in the query? — Schwern
– Schwern, Commented Mar 10, 2022 at 21:06
@BrianK I'm surprised it's that slow with only 32952 rows. Have you analyzed the tables? Note that where 'f' or X is just X. 'f', in boolean context, is false; the f does nothing but confuse the optimizer. — Schwern
– Schwern, Commented Mar 10, 2022 at 21:14

Laurenz Albe · Accepted Answer · 2022-03-11 08:24:38Z

3

The FALSE OR prevents PostgreSQL from optimizing your NOT IN condition to an anti-join. The reason is that the PostgreSQL optimizer doesn't think hard enough for that. However, since it is easy to rewrite the query to do better (simple remove the FALSE OR), there is little reason to make the optimizer smarter. A smart optimizer is slow, and the gain of optimizing an ill-written query better does not outweigh the price that everyone would have to pay.

answered Mar 11, 2022 at 8:24

Laurenz Albe

257k22 gold badges312 silver badges388 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Postgres query performance is damaged when adding a simple OR clause

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related