1

I have 20 million Record in table its Schema is like below

FieldName   Datatype
id          bigint(Auto Inc,Primarykey)
name        varchar(255)
phone       varchar(255)
deleted_at  timestamp
created_at  timestamp
updated_at  timestamp

It has index on name and phone column

Column  Index type
name    GIN trgm index
phone   btree index, GIN trgm index

Created index using the following commands

CREATE INDEX btree_idx ON contacts USING btree (phone);
CREATE INDEX trgm_idx ON contacts USING GIN (phone gin_trgm_ops);
CREATE INDEX trgm_idx_name ON contacts USING GIN (name gin_trgm_ops);

I am running the below query

select * from contacts where phone like '%6666666%' limit 15;

I am doing contains query on phone. The above query takes more than 5 min to get a result. Let me provide the explain statement of this.

explain  analyse select * from contacts where phone like '%6666666%' limit 15;
Limit  (cost=1774.88..1830.57 rows=15 width=65) (actual time=7970.553..203001.985 rows=15 loops=1)
  ->  Bitmap Heap Scan on contacts  (cost=1774.88..10819.13 rows=2436 width=65) (actual time=7970.552..203001.967 rows=15 loops=1)
        Recheck Cond: ((phone)::text ~~ '%6666666%'::text)
        Rows Removed by Index Recheck: 254869
        Heap Blocks: lossy=2819
        ->  Bitmap Index Scan on trgm_idx  (cost=0.00..1774.27 rows=2436 width=0) (actual time=6720.978..6720.978 rows=306226 loops=1)
              Index Cond: ((phone)::text ~~ '%6666666%'::text)
Planning Time: 0.139 ms
Execution Time: 203002.791 ms

Here what can I do to optimize my query? and bring the result under 5 sec would be optimal

8
  • 1
    Try running an analyze on the table to ensure the statistics are up to date. How are the phone numbers stored? If they're normalized, a like may be unnecessary. Commented Nov 2, 2020 at 5:06
  • Added analyze query in the question and phone number are stored directly in the table so like query is required to get the result. User will type minimum 7 number and I have to show the number which contains user input number @Schwern Commented Nov 2, 2020 at 6:00
  • 1
    @a_horse_with_no_name done as you suggested added text format of the query. Commented Nov 2, 2020 at 6:56
  • 2
    The estimates are quite off in the plan. Does running analyze contacts; or vacuum analyze contacts; change anything? Nearly 7 seconds to retrieve only 300000 rows from the index seems rather slow as well. What kind of harddisk does your server have? Commented Nov 2, 2020 at 7:01
  • @a_horse_with_no_name I am using AWS RDS with 20GB General Purpose (SSD),2GB ram and 2vCPU. instance type is t3.small and postgresql version is 12.3. Commented Nov 2, 2020 at 7:22

1 Answer 1

1

One cause of the bad performance is probably

Heap Blocks: lossy=2819

Your work_mem setting is to small to contain a bitmap with one bit per table row, so PostgreSQL degrades it to one bit per 8kB block. This leads to many more rechecks than necessary.

Also, your test is bad. The search string contains only the trigram 666, which will match many rows that don't satisfy the query and have to be removed during recheck. A trigram index is not effective in this pathological case. Test with a number that contains more digits.

Sign up to request clarification or add additional context in comments.

6 Comments

Current work_mem was 4MB I increased in to 16MB and then i perform below query explain (analyse,VERBOSE ) select * from contacts where phone like '%6666667%' limit 15; and it output is below
Limit (cost=578.90..634.59 rows=15 width=65) (actual time=232.579..57587.617 rows=15 loops=1) Output: id, name, phone, deleted_at, created_at, updated_at -> Bitmap Heap Scan on public.contacts (cost=578.90..9630.62 rows=2438 width=65) (actual time=232.577..57587.599 rows=15 loops=1) Recheck Cond: ((contacts.phone)::text ~~ '%6666667%'::text) Rows Removed by Index Recheck: 3402 Heap Blocks: exact=3231 -> Bitmap Index Scan on trgm_idx (cost=0.00..578.29 rows=2438 width=0) (actual time=227.403..227.403 rows=29417 loops=1)
Index Cond: ((contacts.phone)::text ~~ '%6666667%'::text)Planning Time: 0.097 msExecution Time: 57587.660 ms
The performance improved from 75 seconds to 55 seconds. i want to make it under 5 seconds
See my extended answer.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.