1

I have a query that is being ran on PGSQL, and when queried at a fast rate for large data sets, it is taking a long time to run because it isn't making use of the available indexes. I found that changing the filter from multiple OR's to an IN clause causes the right index to be used. Is there a way I can force the index to be used even when using OR's?

Query with Disjunction:

SELECT field1, field2,..., fieldN
FROM table1 WHERE
((((field9='val1' OR field9='val2') OR field9='val3') OR field9='val4')
AND (field6='val5'));

Query Plan:

"Bitmap Heap Scan on table1  (cost=18.85..19.88 rows=1 width=395) (actual time=0.017..0.017 rows=0 loops=1)"
"  Recheck Cond: (((field6)::text = 'val5'::text) AND (((field9)::text = 'val1'::text) OR ((field9)::text = 'val2'::text) OR ((field9)::text = 'val3'::text) OR ((field9)::text = 'val4'::text)))"
"  ->  BitmapAnd  (cost=18.85..18.85 rows=1 width=0) (actual time=0.016..0.016 rows=0 loops=1)"
"        ->  Bitmap Index Scan on idx_field6_field9  (cost=0.00..9.01 rows=611 width=0) (actual time=0.015..0.015 rows=0 loops=1)"
"              Index Cond: ((field6)::text = 'val5'::text)"
"        ->  BitmapOr  (cost=9.59..9.59 rows=516 width=0) (never executed)"
"              ->  Bitmap Index Scan on idx_id_field9  (cost=0.00..2.40 rows=129 width=0) (never executed)"
"                    Index Cond: ((field9)::text = 'val1'::text)"
"              ->  Bitmap Index Scan on idx_id_field9  (cost=0.00..2.40 rows=129 width=0) (never executed)"
"                    Index Cond: ((field9)::text = 'val2'::text)"
"              ->  Bitmap Index Scan on idx_id_field9  (cost=0.00..2.40 rows=129 width=0) (never executed)"
"                    Index Cond: ((field9)::text = 'val3'::text)"
"              ->  Bitmap Index Scan on idx_id_field9  (cost=0.00..2.40 rows=129 width=0) (never executed)"
"                    Index Cond: ((field9)::text = 'val4'::text)"
"Planning time: 0.177 ms"
"Execution time: 0.061 ms"

Query with IN

SELECT field1, field2,..., fieldN
FROM table1
WHERE
((field9 IN ('val1', 'val2', 'val3', 'val4'))
AND (field6='val5'));

Query Plan:

"Index Scan using idx_field6_field9 on table1  (cost=0.43..6.77 rows=1 width=395) (actual time=0.032..0.032 rows=0 loops=1)"
"  Index Cond: (((field6)::text = 'val5'::text) AND ((field9)::text = ANY ('{val1,val2,val3,val4}'::text[])))"
"Planning time: 0.145 ms"
"Execution time: 0.055 ms"

There is an index on field 6 and field 9 which the second query uses as expected, which the first one also should. Field9 is also kind of like a state field, so its cardinality is extremely low - there's only like 9 different values across the whole table. Unfortunately, it isn't straightforward to change the query to use an IN clause, so getting PG to use the right plan would be ideal.

2
  • "Planning time: 0.177 ms" "Execution time: 0.061 ms" I rest my case... [maybe rewiting the query plan would cost an extra 0.1 ms?] Commented May 2, 2019 at 18:41
  • Ok man, I'm telling you under load on larger data sets, this plan causes significant problems... Commented May 2, 2019 at 18:42

1 Answer 1

1

There is no way you can get the fast plan (single index scan) using the OR condition. You'll have to rewrite the query.

You want to know why, which is always difficult to explain. With optimizations like that, there are usually two reasons:

  1. Nobody got around to do it.

  2. This requires extra effort every time a query with an OR is planned:

    • Are there several conditions linked with OR that have the same expression on one side?

    • Both plans, the original and the rewritten one, would have to be estimated. It may well be that the BitmapOr is the most efficient way to process the query.

    This price would have to be paid by every query with OR in it.

I am not saying that it is a bad idea to add an optimization like this, but there are two sides to the coin.

Sign up to request clarification or add additional context in comments.

3 Comments

Could you explain why? That seem like kinda of a big flaw in PG. Would using a different index type help?
Found this: postgresql.org/message-id/[email protected] We can combine multiple bitmaps via AND/OR operations to merge results from multiple indexes before visiting the table ... but it's still potentially worthwhile even for one index.
PG's strategy is targeted towards a flat AND/OR list, like (a AND b) OR (c AND d) OR ... This could work for the OP, too, IF the single-column index on field6 were absent.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.