1
SELECT COUNT(*)
FROM "businesses"
WHERE (businesses.postal_code_id IN
         (SELECT id
          FROM postal_codes
          WHERE lower(city) IN ('los angeles')
            AND lower(region) = 'california'))
  AND (EXISTS
         (SELECT *
          FROM categorizations c
          WHERE c.business_id=businesses.id
            AND c.category_id IN (86)))

I'm have a postgres database businesses, categories, and locations. This query took 95665.9ms to execute and I'm pretty sure the bottleneck is in categorizations. Is there a better way to execute this? The resulting count was 1032

=# EXPLAIN ANALYZE SELECT COUNT(*)
-# FROM "businesses"
-# WHERE (businesses.postal_code_id IN
(#          (SELECT id
(#           FROM postal_codes
(#           WHERE lower(city) IN ('los angeles')
(#             AND lower(region) = 'california'));
                                                                             QUERY PLAN                                                                              
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=4007.74..4007.75 rows=1 width=0) (actual time=263820.923..263820.924 rows=1 loops=1)
   ->  Nested Loop  (cost=41.93..4005.20 rows=1015 width=0) (actual time=469.716..263679.865 rows=112513 loops=1)
         ->  HashAggregate  (cost=15.59..15.60 rows=1 width=4) (actual time=332.664..332.946 rows=82 loops=1)
               ->  Bitmap Heap Scan on postal_codes  (cost=11.57..15.59 rows=1 width=4) (actual time=84.772..332.407 rows=82 loops=1)
                     Recheck Cond: ((lower((city)::text) = 'los angeles'::text) AND (lower((region)::text) = 'california'::text))
                     ->  BitmapAnd  (cost=11.57..11.57 rows=1 width=0) (actual time=77.530..77.530 rows=0 loops=1)
                           ->  Bitmap Index Scan on idx_postal_codes_lower_city  (cost=0.00..5.66 rows=187 width=0) (actual time=22.800..22.800 rows=82 loops=1)
                                 Index Cond: (lower((city)::text) = 'los angeles'::text)
                           ->  Bitmap Index Scan on idx_postal_codes_lower_region  (cost=0.00..5.66 rows=187 width=0) (actual time=54.714..54.714 rows=2356 loops=1)
                                 Index Cond: (lower((region)::text) = 'california'::text)
         ->  Bitmap Heap Scan on businesses  (cost=26.34..3976.91 rows=1015 width=4) (actual time=95.926..3208.426 rows=1372 loops=82)
               Recheck Cond: (postal_code_id = postal_codes.id)
               ->  Bitmap Index Scan on index_businesses_on_postal_code_id  (cost=0.00..26.08 rows=1015 width=0) (actual time=89.864..89.864 rows=1380 loops=82)
                     Index Cond: (postal_code_id = postal_codes.id)
 Total runtime: 263821.016 ms
(15 rows)

And the join version:

EXPLAIN ANALYZE SELECT count(*) FROM businesses
LEFT JOIN postal_codes
ON businesses.postal_code_id = postal_codes.id
WHERE lower(postal_codes.city) = 'los angeles'
AND lower(postal_codes.region) = 'california';

-[ RECORD 1 ]---------------------------------------------------------------------------------------------------------------------------------------------------------------
QUERY PLAN | Aggregate  (cost=4006.14..4006.15 rows=1 width=0) (actual time=143357.170..143357.171 rows=1 loops=1)
-[ RECORD 2 ]---------------------------------------------------------------------------------------------------------------------------------------------------------------
QUERY PLAN |   ->  Nested Loop  (cost=37.91..4005.19 rows=381 width=0) (actual time=138.666..143218.064 rows=112514 loops=1)
-[ RECORD 3 ]---------------------------------------------------------------------------------------------------------------------------------------------------------------
QUERY PLAN |         ->  Bitmap Heap Scan on postal_codes  (cost=11.57..15.59 rows=1 width=4) (actual time=0.559..33.957 rows=82 loops=1)
-[ RECORD 4 ]---------------------------------------------------------------------------------------------------------------------------------------------------------------
QUERY PLAN |               Recheck Cond: ((lower((city)::text) = 'los angeles'::text) AND (lower((region)::text) = 'california'::text))
-[ RECORD 5 ]---------------------------------------------------------------------------------------------------------------------------------------------------------------
QUERY PLAN |               ->  BitmapAnd  (cost=11.57..11.57 rows=1 width=0) (actual time=0.532..0.532 rows=0 loops=1)
-[ RECORD 6 ]---------------------------------------------------------------------------------------------------------------------------------------------------------------
QUERY PLAN |                     ->  Bitmap Index Scan on idx_postal_codes_lower_city  (cost=0.00..5.66 rows=187 width=0) (actual time=0.058..0.058 rows=82 loops=1)
-[ RECORD 7 ]---------------------------------------------------------------------------------------------------------------------------------------------------------------
QUERY PLAN |                           Index Cond: (lower((city)::text) = 'los angeles'::text)
-[ RECORD 8 ]---------------------------------------------------------------------------------------------------------------------------------------------------------------
QUERY PLAN |                     ->  Bitmap Index Scan on idx_postal_codes_lower_region  (cost=0.00..5.66 rows=187 width=0) (actual time=0.461..0.461 rows=2356 loops=1)
-[ RECORD 9 ]---------------------------------------------------------------------------------------------------------------------------------------------------------------
QUERY PLAN |                           Index Cond: (lower((region)::text) = 'california'::text)
-[ RECORD 10 ]--------------------------------------------------------------------------------------------------------------------------------------------------------------
QUERY PLAN |         ->  Bitmap Heap Scan on businesses  (cost=26.34..3976.91 rows=1015 width=4) (actual time=55.493..1742.407 rows=1372 loops=82)
-[ RECORD 11 ]--------------------------------------------------------------------------------------------------------------------------------------------------------------
QUERY PLAN |               Recheck Cond: (postal_code_id = postal_codes.id)
-[ RECORD 12 ]--------------------------------------------------------------------------------------------------------------------------------------------------------------
QUERY PLAN |               ->  Bitmap Index Scan on index_businesses_on_postal_code_id  (cost=0.00..26.09 rows=1015 width=0) (actual time=53.141..53.141 rows=1381 loops=82)
-[ RECORD 13 ]--------------------------------------------------------------------------------------------------------------------------------------------------------------
QUERY PLAN |                     Index Cond: (postal_code_id = postal_codes.id)
-[ RECORD 14 ]--------------------------------------------------------------------------------------------------------------------------------------------------------------
QUERY PLAN | Total runtime: 143357.260 ms

The result is much bigger with the simplified query but given there are indexes and I'm doing only ONE join, I'm surprised it takes so long

1
  • yes, it is really strange, so this cheep query runs too long. Can you a) on some develop machine install pg from source code and try it compile without optimization with debug and profiling support - try to get low level profile. b) try to penalize nested loop, c) try to check your server - CPU Speed, IO speed, Postgres configuration - query with this cost should be evaluated in 1-2 sec Commented Jul 25, 2013 at 7:49

1 Answer 1

2

Try to use a functional indexes over column city

CREATE INDEX ON postal_codes((lower(city)))

There is strong dependency between columns city and region, so sometimes you have to separate these predictions for better accuracy of planner predictions. If you need better prediction, then you need add columns lower_city and lower_region to table postal_codes - PostgreSQL has not statistics over indexes.

Send a execution plan to here - via http://explain.depesz.com/ - if is possible result EXPLAIN ANALYZE YOUR_QUERY

9.1 should to translate correlated subquery to semijoin automatically, but I am not sure. Try to rewrite your query from subqueries to INNER JOIN only form (probably doesn't help, but maybe).

Sign up to request clarification or add additional context in comments.

2 Comments

The functional indexes did not work and neither did the join, unfortunately. I'm not really sure what else I can do to optimize this beyond more hardware or denormalization. I simplified the query and updated post with EXPLAINs for both
have to use LEFT JOIN? Probably INNER JOIN should be enough

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.