I'm designing application for shop catalog, and faced quite slow performance of PostgreSQL.
Here is simplified db scheme (actually there are extra tables for many-to-many relations): db scheme
I'd like to implement filter by attributes(color, size, brand, etc) based on the selected catalog categories(T-Shirts, bags, etc)
Here is an example of the query selecting avaliable attributes for selected list of categories.
SELECT DISTINCT T1.attribute_id
FROM item T0 LEFT OUTER JOIN item_attr_color T1 ON ( T0.id = T1.item_id )
WHERE T0.catalog_id IN (1, 2, 6, 7, 14, 23, 26, 31, 36, 37, 45, 67, 70, 76, 77, 81, 95, 112, 118, 119, 120, 10, 11, 29, 101, 12, 13, 16, 17, 19, 20, 30, 33, 35, 42, 43, 47, 48, 54, 57, 58, 69, 78, 109, 56, 64, 65, 66, 68, 71, 74, 75, 93, 72, 73, 87, 88, 96, 99, 103, 105, 108, 110);
Currently database is rather small ~100k records, but still this query took upto 400msec, which is quite a lot because I have 10 different filter attribute and these queries alone took 4sec to alone, which is unacceptable.
I have indexes(btree type) on all vital fields, here is output of the explain command
HashAggregate (cost=28309.30..28309.43 rows=13 width=4) (actual time=343.343..343.347 rows=14 loops=1)
-> Hash Right Join (cost=24284.42..28074.04 rows=94103 width=4) (actual time=185.278..315.749 rows=115745 loops=1)
Hash Cond: (t1.item_id = t0.id)
-> Seq Scan on core_item_attr_colors t1 (cost=0.00..1797.13 rows=108913 width=8) (actual time=0.006..18.387 rows=107175 loops=1)
-> Hash (cost=23108.13..23108.13 rows=94103 width=4) (actual time=185.182..185.182 rows=93778 loops=1)
Buckets: 16384 Batches: 1 Memory Usage: 3297kB
-> Seq Scan on core_item t0 (cost=0.00..23108.13 rows=94103 width=4) (actual time=0.020..153.334 rows=93778 loops=1)
Filter: (catalog_id = ANY ('{1,2,6,7,14,23,26,31,36,37,45,67,70,76,77,81,95,112,118,119,120,10,11,29,101,12,13,16,17,19,20,30,33,35,42,43,47,48,54,57,58,69,78,109,56,64,65,66,68,71,74,75,93,72,73,87,88,96,99,103,105,108,110}'::integer[]))
Rows Removed by Filter: 19677
Total runtime: 361.231 ms
As you can see it doesn't use any indexes, but I've noticed that decreasing number of categories will eventually force it using indexes:
HashAggregate (cost=18685.04..18685.17 rows=13 width=4) (actual time=166.760..166.764 rows=14 loops=1)
-> Hash Right Join (cost=15515.08..18626.42 rows=23447 width=4) (actual time=56.499..156.865 rows=26501 loops=1)
Hash Cond: (u2.item_id = u0.id)
-> Seq Scan on core_item_attr_colors u2 (cost=0.00..1797.13 rows=108913 width=8) (actual time=0.010..25.706 rows=107175 loops=1)
-> Hash (cost=15221.99..15221.99 rows=23447 width=4) (actual time=56.444..56.444 rows=23099 loops=1)
Buckets: 4096 Batches: 1 Memory Usage: 813kB
-> Bitmap Heap Scan on core_item u0 (cost=1058.03..15221.99 rows=23447 width=4) (actual time=9.732..45.643 rows=23099 loops=1)
Recheck Cond: (catalog_id = ANY ('{1,2,6,7,14,23,26,31,36,37,45,67,70,76,77,81,95,112,118,119}'::integer[]))
-> Bitmap Index Scan on core_item_89ed0239 (cost=0.00..1052.17 rows=23447 width=0) (actual time=6.523..6.523 rows=23099 loops=1)
Index Cond: (catalog_id = ANY ('{1,2,6,7,14,23,26,31,36,37,45,67,70,76,77,81,95,112,118,119}'::integer[]))
Total runtime: 166.858 ms
I've tried replacing postgresql with sqllite and got quite impressive results for the same query on the exactly same data set, it took less than 60msec.
Here is my config file:
max_connections = 100
temp_buffers = 8MB
work_mem = 96MB
maintenance_work_mem = 512MB
effective_cache_size = 512MB
The server has 6G of RAM and an SSD disk.
What am I missing? I'd appreciate any suggestions how to improve performance here.
UPDATE1: shared_buffers = 1024MB and it's PostgreSQL v.9.3
shared_buffers? Also - PostgreSQL version?