I've a booking and a customer table, with following schema:
Booking Table:
Table "public.booking"
Column | Type | Collation | Nullable | Default
-----------------------+--------------------------+-----------+----------+---------
deleted | boolean | | |
booking_id | character varying | | not null |
reference_number | character varying | | |
checkin_date | timestamp with time zone | | |
checkout_date | timestamp with time zone | | |
status | character varying | | |
version | integer | | not null |
comments | text | | |
extra_information | json | | |
cancellation_reason | character varying | | |
cancellation_datetime | timestamp with time zone | | |
created_at | timestamp with time zone | | not null | now()
modified_at | timestamp with time zone | | not null | now()
Indexes:
"booking_pkey" PRIMARY KEY, btree (booking_id)
"ix_booking_reference_number" UNIQUE, btree (reference_number)
"idx_booking_sort_checkin" btree (checkin_date, created_at)
"idx_booking_sort_checkout" btree (checkout_date, created_at)
"idx_booking_stay_dates" btree (checkin_date, checkout_date DESC)
"ix_booking_deleted" btree (deleted)
"ix_booking_status" btree (status)
"trgm_booking_ref_num" gist (reference_number gist_trgm_ops)
Customer Table:
Table "public.booking_customer"
Column | Type | Collation | Nullable | Default
-----------------------+--------------------------+-----------+----------+---------
deleted | boolean | | |
customer_id | character varying | | not null |
booking_id | character varying | | not null |
first_name | character varying | | |
last_name | character varying | | |
phone | character varying | | |
email | character varying | | |
created_at | timestamp with time zone | | not null | now()
modified_at | timestamp with time zone | | not null | now()
Indexes:
"booking_customer_pkey" PRIMARY KEY, btree (customer_id, booking_id)
"book_cust_idx" btree (booking_id, customer_id)
"idx_booking_customer_full_name" btree (((first_name::text || ' '::text) || last_name::text))
"ix_booking_customer_deleted" btree (deleted)
"ix_booking_customer_email" btree (email)
"ix_booking_customer_first_name" btree (first_name)
"ix_booking_customer_last_name" btree (last_name)
"ix_booking_customer_phone" btree (phone)
"trgm_cust_first_name" gist (first_name gist_trgm_ops)
"trgm_cust_full_name" gist (((first_name::text || ' '::text) || last_name::text) gist_trgm_ops)
"trgm_cust_last_name" gist (last_name gist_trgm_ops)
I'm running the following query:
EXPLAIN ANALYZE
SELECT bk.booking_id, bk.created_at, bk.checkin_date
FROM booking bk
WHERE bk.reference_number = '9123889123' OR
EXISTS (
SELECT 1 FROM booking_customer cust
WHERE cust.booking_id = bk.booking_id AND (
cust.email = '9123889123' OR
cust.phone = '9123889123'
) AND
cust.deleted = false
)
ORDER BY bk.checkin_date DESC, bk.created_at DESC
LIMIT 10 OFFSET 0;
This results in following query plan:
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.42..365.54 rows=10 width=31) (actual time=57.861..865.883 rows=3 loops=1)
-> Index Scan Backward using idx_booking_sort_checkin on booking bk (cost=0.42..14419601.66 rows=394937 width=31) (actual time=57.858..865.877 rows=3 loops=1)
Filter: (((reference_number)::text = '9916092871'::text) OR (alternatives: SubPlan 1 or hashed SubPlan 2))
Rows Removed by Filter: 676681
SubPlan 1
-> Bitmap Heap Scan on booking_customer cust (cost=14.08..18.10 rows=1 width=0) (never executed)
Recheck Cond: (((booking_id)::text = (bk.booking_id)::text) AND (((email)::text = '9916092871'::text) OR ((phone)::text = '9916092871'::text)))
Filter: (NOT deleted)
-> BitmapAnd (cost=14.08..14.08 rows=1 width=0) (never executed)
-> Bitmap Index Scan on book_cust_idx (cost=0.00..4.49 rows=8 width=0) (never executed)
Index Cond: ((booking_id)::text = (bk.booking_id)::text)
-> BitmapOr (cost=9.34..9.34 rows=65 width=0) (never executed)
-> Bitmap Index Scan on ix_booking_customer_email (cost=0.00..4.67 rows=33 width=0) (never executed)
Index Cond: ((email)::text = '9916092871'::text)
-> Bitmap Index Scan on ix_booking_customer_phone (cost=0.00..4.67 rows=32 width=0) (never executed)
Index Cond: ((phone)::text = '9916092871'::text)
SubPlan 2
-> Bitmap Heap Scan on booking_customer cust_1 (cost=9.38..264.83 rows=65 width=32) (actual time=0.047..0.050 rows=3 loops=1)
Recheck Cond: (((email)::text = '9916092871'::text) OR ((phone)::text = '9916092871'::text))
Filter: (NOT deleted)
Heap Blocks: exact=3
-> BitmapOr (cost=9.38..9.38 rows=65 width=0) (actual time=0.042..0.042 rows=0 loops=1)
-> Bitmap Index Scan on ix_booking_customer_email (cost=0.00..4.67 rows=33 width=0) (actual time=0.019..0.019 rows=0 loops=1)
Index Cond: ((email)::text = '9916092871'::text)
-> Bitmap Index Scan on ix_booking_customer_phone (cost=0.00..4.67 rows=32 width=0) (actual time=0.023..0.023 rows=3 loops=1)
Index Cond: ((phone)::text = '9916092871'::text)
Planning time: 0.782 ms
Execution time: 865.956 ms
(28 rows)
If you see, there is Filter predicate used by postgres on reference_number and booking_id field, which I've index on.
But, if I remove the OR condition from WHERE clause, it starts using the index:
For this query:
EXPLAIN ANALYZE
SELECT bk.booking_id, bk.created_at, bk.checkin_date
FROM booking bk
WHERE bk.reference_number = '9123889123'
ORDER BY bk.checkin_date DESC, bk.created_at DESC
LIMIT 10 OFFSET 0;
The query plan is like this:
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=8.45..8.46 rows=1 width=31) (actual time=0.021..0.021 rows=0 loops=1)
-> Sort (cost=8.45..8.46 rows=1 width=31) (actual time=0.020..0.020 rows=0 loops=1)
Sort Key: checkin_date DESC, created_at DESC
Sort Method: quicksort Memory: 25kB
-> Index Scan using ix_booking_reference_number on booking bk (cost=0.42..8.44 rows=1 width=31) (actual time=0.014..0.014 rows=0 loops=1)
Index Cond: ((reference_number)::text = '9123889123'::text)
Planning time: 0.334 ms
Execution time: 0.042 ms
I'm unable to understand why this behaviour change? reference_number and booking_id has unique index. Alsom what are those 2 subplans in the first query? Could that also be affecting the query performance?
I've created gist index on reference_number, to allow LIKE query index, which I'm using elsewhere.
Is there something that can be changed to improve the query performance?
I've half a million record in booking table, and 2 millions record in customer table.