Need to improve count performance in PostgreSQL for this query

Question

I have this query in PostgreSQL:

SELECT COUNT("contacts"."id") 
FROM "contacts" 
  INNER JOIN  "phone_numbers" ON "phone_numbers"."id" = "contacts"."phone_number_id" 
  INNER JOIN "companies" ON "companies"."id" = "contacts"."company_id"
WHERE (
        (
          (
            CAST("phone_numbers"."value" AS VARCHAR) ILIKE '%a%' 
            OR CAST("contacts"."first_name" AS VARCHAR) ILIKE '%a%'
          ) 
          OR CAST("contacts"."last_name" AS VARCHAR) ILIKE '%a%'
        )  
        OR CAST("companies"."name" AS VARCHAR) ILIKE '%a%'
      )

When I run the query it is taking 19secs to run. I need to improve the performance.

Note: I already have the index for the columns.

EXPLAIN ANALYZE report

Finalize Aggregate  (cost=209076.49..209076.54 rows=1 width=8) (actual time=6117.381..6646.477 rows=1 loops=1)
  ->  Gather  (cost=209076.42..209076.48 rows=4 width=8) (actual time=6117.370..6646.473 rows=5 loops=1)
        Workers Planned: 4
        Workers Launched: 4
        ->  Partial Aggregate  (cost=209066.42..209066.47 rows=1 width=8) (actual time=5952.710..5952.723 rows=1 loops=5)
              ->  Hash Join  (cost=137685.37..208438.42 rows=251200 width=8) (actual time=3007.055..5945.571 rows=39193 loops=5)
                    Hash Cond: (contacts.company_id = companies.id)
                    Join Filter: (((phone_numbers.value)::text ~~* '%as%'::text) OR ((contacts.first_name)::text ~~* '%as%'::text) OR ((contacts.last_name)::text ~~* '%as%'::text) OR ((companies.name)::text ~~* '%as%'::text))
                    Rows Removed by Join Filter: 763817
                    ->  Parallel Hash Join  (cost=137684.86..201964.34 rows=1003781 width=41) (actual time=3006.633..4596.987 rows=803010 loops=5)
                          Hash Cond: (contacts.phone_number_id = phone_numbers.id)
                          ->  Parallel Seq Scan on contacts  (cost=0.00..59316.85 rows=1003781 width=37) (actual time=11.032..681.124 rows=803010 loops=5)
                          ->  Parallel Hash  (cost=68914.22..68914.22 rows=1295458 width=20) (actual time=1632.770..1632.770 rows=803184 loops=5)
                                Buckets: 65536  Batches: 64  Memory Usage: 4032kB
                                ->  Parallel Seq Scan on phone_numbers  (cost=0.00..68914.22 rows=1295458 width=20) (actual time=10.780..1202.242 rows=803184 loops=5)
                    ->  Hash  (cost=0.30..0.30 rows=4 width=40) (actual time=0.258..0.258 rows=4 loops=5)
                          Buckets: 1024  Batches: 1  Memory Usage: 9kB
                          ->  Seq Scan on companies  (cost=0.00..0.30 rows=4 width=40) (actual time=0.247..0.248 rows=4 loops=5)
Planning Time: 1.895 ms
Execution Time: 6646.558 ms

Please help me on this performance issue.

I tried FUNCTION row_count_estimate (query text) and it is not giving the exact count.

Solution Tried: I tried the Robert solution and got 16 Secs to run

My Query is:

SELECT Count(id) AS id
FROM   (
              SELECT contacts.id AS id
              FROM   contacts 
              WHERE  (
                            contacts.last_name ilike '%as%')
              OR     (
                            contacts.last_name ilike '%as%')
              UNION
              SELECT contacts.id AS id
              FROM   contacts
              WHERE  contacts.phone_number_id IN
                     (
                            SELECT phone_numbers.id AS phone_number_id
                            FROM   phone_numbers
                            WHERE  phone_numbers.value ilike '%as%')
              UNION
              SELECT contacts.id AS id
              FROM   contacts
              WHERE  contacts.company_id IN
                     (
                            SELECT companies.id AS company_id
                            FROM   companies
                            WHERE  companies.name ilike '%as%' )) AS ID

Report:

Aggregate  (cost=395890.08..395890.13 rows=1 width=8) (actual time=5942.601..5942.667 rows=1 loops=1)
  ->  Unique  (cost=332446.76..337963.57 rows=1103362 width=8) (actual time=5929.800..5939.658 rows=101989 loops=1)
        ->  Sort  (cost=332446.76..335205.17 rows=1103362 width=8) (actual time=5929.799..5933.823 rows=101989 loops=1)
              Sort Key: contacts.id
              Sort Method: external merge  Disk: 1808kB
              ->  Append  (cost=10.00..220843.02 rows=1103362 width=8) (actual time=1.158..5900.926 rows=101989 loops=1)
                    ->  Gather  (cost=10.00..61935.48 rows=99179 width=8) (actual time=1.158..569.412 rows=101989 loops=1)
                          Workers Planned: 4
                          Workers Launched: 4
                          ->  Parallel Seq Scan on contacts  (cost=0.00..61826.30 rows=24795 width=8) (actual time=0.446..477.276 rows=20398 loops=5)
                                Filter: ((last_name)::text ~~* '%as%'::text)
                                Rows Removed by Filter: 782612
                    ->  Nested Loop  (cost=0.84..359.91 rows=402 width=8) (actual time=5292.088..5292.089 rows=0 loops=1)
                          ->  Index Scan using idx_phone_value on phone_numbers  (cost=0.41..64.13 rows=402 width=8) (actual time=5292.087..5292.087 rows=0 loops=1)
                                Index Cond: ((value)::text ~~* '%as%'::text)
                                Rows Removed by Index Recheck: 4015921
                          ->  Index Scan using index_contacts_on_phone_number_id on contacts contacts_1  (cost=0.43..0.69 rows=1 width=16) (never executed)
                                Index Cond: (phone_number_id = phone_numbers.id)
                    ->  Gather  (cost=10.36..75795.48 rows=1003781 width=8) (actual time=26.298..26.331 rows=0 loops=1)
                          Workers Planned: 4
                          Workers Launched: 4
                          ->  Hash Join  (cost=0.36..74781.70 rows=250945 width=8) (actual time=3.758..3.758 rows=0 loops=5)
                                Hash Cond: (contacts_2.company_id = companies.id)
                                ->  Parallel Seq Scan on contacts contacts_2  (cost=0.00..59316.85 rows=1003781 width=16) (actual time=0.128..0.128 rows=1 loops=5)
                                ->  Hash  (cost=0.31..0.31 rows=1 width=8) (actual time=0.726..0.727 rows=0 loops=5)
                                      Buckets: 1024  Batches: 1  Memory Usage: 8kB
                                      ->  Seq Scan on companies  (cost=0.00..0.31 rows=1 width=8) (actual time=0.726..0.726 rows=0 loops=5)
                                            Filter: ((name)::text ~~* '%as%'::text)
                                            Rows Removed by Filter: 4
Planning Time: 0.846 ms
Execution Time: 5948.330 ms

I tried the below also:

EXPLAIN ANALYZE  SELECT
        count(id) AS id 
    FROM
        (SELECT
            contacts.id AS id 
        FROM
            contacts 
        WHERE
            (
                position('as' in LOWER(last_name)) > 0
            ) 
        UNION
        SELECT
            contacts.id AS id 
        FROM
            contacts 
        WHERE
            EXISTS (
                SELECT
                    1 
                FROM
                    phone_numbers 
                WHERE
                    (
                        position('as' in LOWER(phone_numbers.value)) > 0
                    ) 
                    AND (
                        contacts.phone_number_id = phone_numbers.id
                    )
            ) 
        UNION 
        SELECT
            contacts.id AS id 
        FROM
            contacts 
        WHERE
            EXISTS (
                SELECT
                    1 
                FROM
                    companies 
                WHERE
                    (
                        position('as' in LOWER(companies.name)) > 0
                    ) 
                    AND (
                        contacts.company_id = companies.id
                    )
            ) 
        UNION DISTINCT SELECT
            contacts.id AS id 
        FROM
            contacts 
        WHERE
            (
                position('as' in LOWER(first_name)) > 0
            )
    ) AS ID;

Report

Aggregate  (cost=1609467.66..1609467.71 rows=1 width=8) (actual time=1039.249..1039.330 rows=1 loops=1)
  ->  Unique  (cost=1320886.03..1345980.09 rows=5018811 width=8) (actual time=999.363..1030.500 rows=195963 loops=1)
        ->  Sort  (cost=1320886.03..1333433.06 rows=5018811 width=8) (actual time=999.362..1013.818 rows=198421 loops=1)
              Sort Key: contacts.id
              Sort Method: external merge  Disk: 3520kB
              ->  Gather  (cost=10.00..754477.62 rows=5018811 width=8) (actual time=0.581..941.210 rows=198421 loops=1)
                    Workers Planned: 4
                    Workers Launched: 4
                    ->  Parallel Append  (cost=0.00..749448.80 rows=5018811 width=8) (actual time=290.521..943.736 rows=39684 loops=5)
                          ->  Parallel Hash Join  (cost=101469.35..164569.24 rows=334587 width=8) (actual time=724.841..724.843 rows=0 loops=2)
                                Hash Cond: (contacts.phone_number_id = phone_numbers.id)
                                ->  Parallel Seq Scan on contacts  (cost=0.00..59315.91 rows=1003762 width=16) (never executed)
                                ->  Parallel Hash  (cost=78630.16..78630.16 rows=431819 width=8) (actual time=723.735..723.735 rows=0 loops=2)
                                      Buckets: 131072  Batches: 32  Memory Usage: 0kB
                                      ->  Parallel Seq Scan on phone_numbers  (cost=0.00..78630.16 rows=431819 width=8) (actual time=723.514..723.514 rows=0 loops=2)
                                            Filter: ("position"(lower((value)::text), 'as'::text) > 0)
                                            Rows Removed by Filter: 2007960
                          ->  Hash Join  (cost=0.38..74780.48 rows=250940 width=8) (actual time=0.888..0.888 rows=0 loops=1)
                                Hash Cond: (contacts_1.company_id = companies.id)
                                ->  Parallel Seq Scan on contacts contacts_1  (cost=0.00..59315.91 rows=1003762 width=16) (actual time=0.009..0.009 rows=1 loops=1)
                                ->  Hash  (cost=0.33..0.33 rows=1 width=8) (actual time=0.564..0.564 rows=0 loops=1)
                                      Buckets: 1024  Batches: 1  Memory Usage: 8kB
                                      ->  Seq Scan on companies  (cost=0.00..0.33 rows=1 width=8) (actual time=0.563..0.563 rows=0 loops=1)
                                            Filter: ("position"(lower((name)::text), 'as'::text) > 0)
                                            Rows Removed by Filter: 4
                          ->  Parallel Seq Scan on contacts contacts_2  (cost=0.00..66844.13 rows=334588 width=8) (actual time=0.119..315.032 rows=20398 loops=5)
                                Filter: ("position"(lower((last_name)::text), 'as'::text) > 0)
                                Rows Removed by Filter: 782612
                          ->  Parallel Seq Scan on contacts contacts_3  (cost=0.00..66844.13 rows=334588 width=8) (actual time=0.510..558.791 rows=32144 loops=3)
                                Filter: ("position"(lower((first_name)::text), 'as'::text) > 0)
                                Rows Removed by Filter: 1306206
Planning Time: 2.115 ms
Execution Time: 1040.620 ms

All those casts seem pretty useless but don't really change anything unless you have trigram indexes on those columns — user330315
– user330315, Commented Sep 10, 2022 at 12:12
counts are notoriously slow, your where clause will always check all rows. — nbk
– nbk, Commented Sep 10, 2022 at 12:17
ILIKE '%a%' is a performance killer. The engine will need to scan the whole columns to decide which rows should be selected. Are you always looking for a or you can also be looking for other characters? In the first case you could use partial indexes. — The Impaler
– The Impaler, Commented Sep 10, 2022 at 13:02
you can use substring instead of like - delftstack.com/howto/postgres/… — Hogan
– Hogan, Commented Sep 10, 2022 at 15:09
50000 is a good enough estimate. Did you ever question if the 320000 hits Google promises you are accurate or not? Just be honest and say "result 1 through 50 of approximately 50000". Paging will work just fine unless you promise to take the user to the last page. And even with an exact count you could be wrong, because potential search results could be added concurrently. — Laurenz Albe
– Laurenz Albe, Commented Sep 12, 2022 at 3:56

samaswin · Accepted Answer · 2022-09-10 20:09:37Z

It's hard to help you, because I don't have acces to your data. Let me try...

EXPLAIN ANALYZE report shows that:

Yor query doesn't using indexes. Full scan on table phone_numbers tooks 1.202 second, and 0.681 senod on contacts table.
"Rows Removed by Join Filter: 763817". "Parallel Hash Join (cost=137684.86..201964.34 rows=1003781 width=41) (actual time=3006.633..4596.987 rows=803010 loops=5)" . So this query joins ~800k rows and then filter 763k of it.

Maybe you can reverse that. This should speed up (but that needs to be checked).

For example you can test this - rewrite your query in this direction:

    SELECT COUNT( ID)
    FROM
     (
       SELECT "contacts"."id"
        FROM "contacts" 
        Where <filters on contract here>
       union
       SELECT "contacts"."id"
        FROM "contacts"
        where phone_number_id in ( select   "phone_numbers"."id"
                                    from "phone_numbers" 
                                   where <filters on  phone_numbers here> 
                                 ) as A
       union
       SELECT "contacts"."id"
        FROM "contacts"
        where company_id in ( select   "companies"."id"
                                    from "companies" 
                                   where <filters on  companies here> )
                            ) as B

Two indexes: one on column contacts.phone_number_id and another on contacts.company_id might help.

EDIT: It using index on "phone_numbers"."id" with nested loop it tooks 5 seconds. Try to avoid this.

Please check, what it will do for this:

SELECT Count(id) AS id
FROM   (
              SELECT contacts.id AS id
              FROM   contacts 
              WHERE  (
                            contacts.last_name ilike '%as%')
              OR     (
                            contacts.last_name ilike '%as%')
              UNION
              SELECT contacts.id AS id
              FROM   contacts
              WHERE  contacts.phone_number_id IN
                     (
                            SELECT to_number(to_char(phone_numbers.id))) /* just for disable index scan for that column */ AS phone_number_id
                            FROM   phone_numbers
                            WHERE  phone_numbers.value ilike '%as%')
              UNION
              SELECT contacts.id AS id
              FROM   contacts
              WHERE  contacts.company_id IN
                     (
                            SELECT companies.id AS company_id
                            FROM   companies
                            WHERE  companies.name ilike '%as%' )) AS ID

Aggregate  (cost=419095.35..419095.40 rows=1 width=8) (actual time=13235.986..13236.335 rows=1 loops=1)
  ->  Unique  (cost=346875.23..353155.24 rows=1256002 width=8) (actual time=13211.350..13230.729 rows=195963 loops=1)
        ->  Sort  (cost=346875.23..350015.24 rows=1256002 width=8) (actual time=13211.349..13219.607 rows=195963 loops=1)
              Sort Key: contacts.id
              Sort Method: external merge  Disk: 3472kB
              ->  Append  (cost=2249.63..218658.27 rows=1256002 width=8) (actual time=5927.019..13164.421 rows=195963 loops=1)
                    ->  Gather  (cost=2249.63..48279.58 rows=251838 width=8) (actual time=5927.019..6911.795 rows=195963 loops=1)
                          Workers Planned: 4
                          Workers Launched: 4
                          ->  Parallel Bitmap Heap Scan on contacts  (cost=2239.63..48017.74 rows=62960 width=8) (actual time=5861.480..6865.957 rows=39193 loops=5)
                                Recheck Cond: (((first_name)::text ~~* '%as%'::text) OR ((last_name)::text ~~* '%as%'::text))
                                Rows Removed by Index Recheck: 763815
                                Heap Blocks: exact=10860 lossy=6075
                                ->  BitmapOr  (cost=2239.63..2239.63 rows=255705 width=0) (actual time=5917.966..5917.966 rows=0 loops=1)
                                      ->  Bitmap Index Scan on idx_trgm_contacts_first_name  (cost=0.00..1291.57 rows=156527 width=0) (actual time=2972.404..2972.404 rows=4015039 loops=1)
                                            Index Cond: ((first_name)::text ~~* '%as%'::text)
                                      ->  Bitmap Index Scan on idx_trgm_contacts_last_name  (cost=0.00..822.14 rows=99177 width=0) (actual time=2945.560..2945.560 rows=4015038 loops=1)
                                            Index Cond: ((last_name)::text ~~* '%as%'::text)
                    ->  Nested Loop  (cost=81.96..384.33 rows=402 width=8) (actual time=6213.028..6213.028 rows=0 loops=1)
                          ->  Unique  (cost=81.52..83.53 rows=402 width=8) (actual time=6213.027..6213.027 rows=0 loops=1)
                                ->  Sort  (cost=81.52..82.52 rows=402 width=8) (actual time=6213.027..6213.027 rows=0 loops=1)
                                      Sort Key: ((NULLIF((phone_numbers.id)::text, ''::text))::integer)
                                      Sort Method: quicksort  Memory: 25kB
                                      ->  Index Scan using idx_trgm_phone_value on phone_numbers  (cost=0.41..64.13 rows=402 width=8) (actual time=6213.006..6213.006 rows=0 loops=1)
                                            Index Cond: ((value)::text ~~* '%as%'::text)
                                            Rows Removed by Index Recheck: 4015921
                          ->  Index Scan using index_contacts_on_phone_number_id on contacts contacts_1  (cost=0.44..0.70 rows=1 width=16) (never executed)
                                Index Cond: (phone_number_id = (NULLIF((phone_numbers.id)::text, ''::text))::integer)
                    ->  Gather  (cost=10.36..75794.22 rows=1003762 width=8) (actual time=25.691..25.709 rows=0 loops=1)
                          Workers Planned: 4
                          Workers Launched: 4
                          ->  Hash Join  (cost=0.36..74780.46 rows=250940 width=8) (actual time=2.653..2.653 rows=0 loops=5)
                                Hash Cond: (contacts_2.company_id = companies.id)
                                ->  Parallel Seq Scan on contacts contacts_2  (cost=0.00..59315.91 rows=1003762 width=16) (actual time=0.244..0.244 rows=1 loops=5)
                                ->  Hash  (cost=0.31..0.31 rows=1 width=8) (actual time=0.244..0.244 rows=0 loops=5)
                                      Buckets: 1024  Batches: 1  Memory Usage: 8kB
                                      ->  Seq Scan on companies  (cost=0.00..0.31 rows=1 width=8) (actual time=0.244..0.244 rows=0 loops=5)
                                            Filter: ((name)::text ~~* '%as%'::text)
                                            Rows Removed by Filter: 4
Planning Time: 1.458 ms
Execution Time: 13236.949 ms

I tried below,

SELECT Count(id) AS id
FROM   (
              SELECT contacts.id AS id
              FROM   contacts 
              WHERE  (substring(LOWER(contacts.first_name), position('as' in LOWER(first_name)), 2) = 'as')
              OR     (substring(LOWER(contacts.last_name), position('as' in LOWER(last_name)), 2) = 'as')
              UNION
              SELECT contacts.id AS id
              FROM   contacts
              WHERE  contacts.phone_number_id IN
                     (
                            SELECT NULLIF(CAST(phone_numbers.id AS text), '')::int AS phone_number_id 
                            FROM   phone_numbers
                            WHERE  (substring(LOWER(phone_numbers.value), position('as' in LOWER(phone_numbers.value)), 2) = 'as'))
              UNION
              SELECT contacts.id AS id
              FROM   contacts
              WHERE  contacts.company_id IN
                     (
                            SELECT companies.id AS company_id
                            FROM   companies
                            WHERE  (substring(LOWER(companies.name), position('as' in LOWER(companies.name)), 2) = 'as') )) AS ID

Aggregate  (cost=508646.88..508646.93 rows=1 width=8) (actual time=1455.892..1455.995 rows=1 loops=1)
  ->  Unique  (cost=447473.09..452792.55 rows=1063892 width=8) (actual time=1431.464..1450.434 rows=195963 loops=1)
        ->  Sort  (cost=447473.09..450132.82 rows=1063892 width=8) (actual time=1431.464..1439.267 rows=195963 loops=1)
              Sort Key: contacts.id
              Sort Method: external merge  Disk: 3472kB
              ->  Append  (cost=10.00..340141.41 rows=1063892 width=8) (actual time=0.391..1370.557 rows=195963 loops=1)
                    ->  Gather  (cost=10.00..84460.02 rows=40050 width=8) (actual time=0.391..983.457 rows=195963 loops=1)
                          Workers Planned: 4
                          Workers Launched: 4
                          ->  Parallel Seq Scan on contacts  (cost=0.00..84409.97 rows=10012 width=8) (actual time=1.696..987.285 rows=39193 loops=5)
                                Filter: (("substring"(lower((first_name)::text), "position"(lower((first_name)::text), 'as'::text), 2) = 'as'::text) OR ("substring"(lower((last_name)::text), "position"(lower((last_name)::text), 'as'::text), 2) = 'as'::text))
                                Rows Removed by Filter: 763817
                    ->  Nested Loop  (cost=85188.17..100095.23 rows=20080 width=8) (actual time=364.076..364.125 rows=0 loops=1)
                          ->  HashAggregate  (cost=85187.73..86191.73 rows=20080 width=8) (actual time=364.074..364.123 rows=0 loops=1)
                                Group Key: (NULLIF((phone_numbers.id)::text, ''::text))::integer
                                Batches: 1  Memory Usage: 793kB
                                ->  Gather  (cost=10.00..85137.53 rows=20080 width=8) (actual time=363.976..364.025 rows=0 loops=1)
                                      Workers Planned: 3
                                      Workers Launched: 3
                                      ->  Parallel Seq Scan on phone_numbers  (cost=0.00..85107.45 rows=6477 width=8) (actual time=357.030..357.031 rows=0 loops=4)
                                            Filter: ("substring"(lower((value)::text), "position"(lower((value)::text), 'as'::text), 2) = 'as'::text)
                                            Rows Removed by Filter: 1003980
                          ->  Index Scan using index_contacts_on_phone_number_id on contacts contacts_1  (cost=0.44..0.64 rows=1 width=16) (never executed)
                                Index Cond: (phone_number_id = (NULLIF((phone_numbers.id)::text, ''::text))::integer)
                    ->  Gather  (cost=10.40..75794.26 rows=1003762 width=8) (actual time=6.889..6.910 rows=0 loops=1)
                          Workers Planned: 4
                          Workers Launched: 4
                          ->  Hash Join  (cost=0.40..74780.50 rows=250940 width=8) (actual time=0.138..0.139 rows=0 loops=5)
                                Hash Cond: (contacts_2.company_id = companies.id)
                                ->  Parallel Seq Scan on contacts contacts_2  (cost=0.00..59315.91 rows=1003762 width=16) (actual time=0.004..0.004 rows=1 loops=5)
                                ->  Hash  (cost=0.35..0.35 rows=1 width=8) (actual time=0.081..0.081 rows=0 loops=5)
                                      Buckets: 1024  Batches: 1  Memory Usage: 8kB
                                      ->  Seq Scan on companies  (cost=0.00..0.35 rows=1 width=8) (actual time=0.081..0.081 rows=0 loops=5)
                                            Filter: ("substring"(lower((name)::text), "position"(lower((name)::text), 'as'::text), 2) = 'as'::text)
                                            Rows Removed by Filter: 4
Planning Time: 0.927 ms
Execution Time: 1456.742 ms

Solution I Tried and got a error. I edited in the description. Please help me on this
@samaswin -- I updated the query above to add alias to the two sub-queries -- you should not get the error you showed anymore
AD "Solution Tried: I tried the Robert solution and got 16 Secs to run". Can you please paste EXPLAIN ANALYZE of this query?
to_char(phone_numbers.id) it is a bigint. That causing issues. I used NULLIF(CAST(phone_numbers.id AS text), '')::int AS phone_number_id @RobertG

Collectives™ on Stack Overflow

Need to improve count performance in PostgreSQL for this query

1 Answer 1

8 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

8 Comments

Your Answer

Sign up or log in

Post as a guest

Related