0

I have a single small table(~400k rows), the table is indexed by collection_id and contains a JSON column with several GIN indexes defined, one of them is on the value tagline.id.

The query to get all the objects with a specific tagline.id sometimes is VERY slow:

explain (analyze, buffers)
SELECT "objects_object"."created",
       "objects_object"."modified",
       "objects_object"."_id",
       "objects_object"."id",
       "objects_object"."collection_id",
       "objects_object"."data",
       "objects_object"."search",
       "objects_object"."location"::bytea
FROM "objects_object"
WHERE ("objects_object"."collection_id" IN (3381, 3321, 3312, 3262, 3068, 2684, 2508, 2159, 2158, 2154, 2157, 2156)
  AND (("objects_object"."data" #>> ARRAY['tagline','id']))::float IN ('8')
  AND ("objects_object"."data" -> 'tagline') ? 'id')
ORDER BY "objects_object"."created" DESC,
         "objects_object"."id" ASC
LIMIT 101;                                                                         


    QUERY PLAN                                                                               
------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=8.46..8.47 rows=1 width=1239) (actual time=5513.374..5513.399 rows=101 loops=1)
   Buffers: shared hit=4480 read=6261
   ->  Sort  (cost=8.46..8.47 rows=1 width=1239) (actual time=5513.372..5513.389 rows=101 loops=1)
         Sort Key: created DESC, id
         Sort Method: top-N heapsort  Memory: 247kB
         Buffers: shared hit=4480 read=6261
         ->  Index Scan using index_tagline_id_float_51a27976 on objects_object  (cost=0.42..8.45 rows=1 width=1239) (actual time=943.689..5513.002 rows=235 loops=1)
               Index Cond: (((data #>> '{tagline,id}'::text[]))::double precision = '8'::double precision)
               Filter: (collection_id = ANY ('{3381,3321,3312,3262,3068,2684,2508,2159,2158,2154,2157,2156}'::integer[]))
               Rows Removed by Filter: 47295
               Buffers: shared hit=4480 read=6261
 Planning time: 0.244 ms
 Execution time: 5513.439 ms
(13 rows)

If executed multiple times the execution time drops to ~ 5 ms.

What is taking so long? Why after the first time the execution time drops that much?

I don't think it's memory related since the default memory(4MB) is much higher than the required(247Kb).

EDIT: Index definitions:

SELECT indexdef FROM pg_indexes 
WHERE indexname = 'index_tagline_id_float_51a27976'; 
                                                                                                 indexdef                                                                                                 
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 CREATE INDEX index_tagline_id_float_51a27976 ON public.objects_object USING btree ((((data #>> ARRAY['tagline'::text, 'id'::text]))::double precision)) WHERE ((data -> 'tagline'::text) ? 'id'::text)
(1 row)

SELECT indexdef FROM pg_indexes 
WHERE indexname = 'objects_object_collection_id_6f1559f5'; 
                                                indexdef                                                 
---------------------------------------------------------------------------------------------------------
 CREATE INDEX objects_object_collection_id_6f1559f5 ON public.objects_object USING btree (collection_id)
(1 row)

EDIT:

After adding the index test:

select indexdef from pg_indexes where indexname='test'; 
                                                                                          indexdef                                                                                          
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 CREATE INDEX test ON public.objects_object USING btree ((((data #>> ARRAY['tagline'::text, 'id'::text]))::double precision), collection_id) WHERE ((data -> 'tagline'::text) ? 'id'::text)
(1 row)

The execution time decreased, but so the buffer shared hit, not sure this improved the performance then:

                                                                 QUERY PLAN                                                                 
--------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=8.46..8.47 rows=1 width=1238) (actual time=1721.260..1721.281 rows=101 loops=1)
   Buffers: shared hit=5460 read=5115
   ->  Sort  (cost=8.46..8.47 rows=1 width=1238) (actual time=1721.257..1721.270 rows=101 loops=1)
         Sort Key: created DESC, id
         Sort Method: top-N heapsort  Memory: 298kB
         Buffers: shared hit=5460 read=5115
         ->  Index Scan using test on objects_object  (cost=0.42..8.45 rows=1 width=1238) (actual time=1682.637..1720.793 rows=235 loops=1)
               Index Cond: (((data #>> '{tagline,id}'::text[]))::double precision = '8'::double precision)
               Filter: (collection_id = ANY ('{3381,3321,3312,3262,3068,2684,2508,2159,2158,2154,2157,2156}'::integer[]))
               Rows Removed by Filter: 47295
               Buffers: shared hit=5454 read=5115
 Planning time: 238.364 ms
 Execution time: 1762.996 ms
(13 rows)

The problem seems to be that collection_id should be part of the index condition, not on the filtering, this would avoid getting from the (slow) data storage a large amount of data.

Why is the index not working as expected?

UPDATE: Apparently the order of the parameters impacted on the query plan, i rewrote the index as:

select indexdef from pg_indexes where indexname='test'; 
                                                                                          indexdef                                                                                          
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 CREATE INDEX test ON public.objects_object USING btree (collection_id, (((data #>> ARRAY['tagline'::text, 'id'::text]))::double precision)) WHERE ((data -> 'tagline'::text) ? 'id'::text)

Running now the query we can see the lower number of read records:

 Limit  (cost=57.15..57.16 rows=1 width=1177) (actual time=1.043..1.059 rows=101 loops=1)
   Buffers: shared hit=101 read=10
   ->  Sort  (cost=57.15..57.16 rows=1 width=1177) (actual time=1.040..1.047 rows=101 loops=1)
         Sort Key: created DESC, id
         Sort Method: top-N heapsort  Memory: 304kB
         Buffers: shared hit=101 read=10
         ->  Index Scan using test on objects_object  (cost=0.42..57.14 rows=1 width=1177) (actual time=0.094..0.670 rows=232 loops=1)
               Index Cond: ((collection_id = ANY ('{3381,3321,3312,3262,3068,2684,2508,2159,2158,2154,2157,2156}'::integer[])) AND (((data #>> '{tagline,id}'::text[]))::double precision = '8'::double precisio
n))
               Buffers: shared hit=95 read=10
 Planning time: 416.365 ms
 Execution time: 43.463 ms
(11 rows)
8
  • What is the definition of the existing (partial?) index? Do you have an index on collection_id? Commented Mar 18, 2019 at 10:37
  • Yes, there is an index on collection_id, it's not being used on this query though as far as i can see. Commented Mar 18, 2019 at 10:43
  • Does ANALYZE on the table improve the performance? Commented Mar 18, 2019 at 10:49
  • not really, little improved but still very slow. Commented Mar 18, 2019 at 10:59
  • 1
    Please add the index definition to the question. It is normal that last_analyze is NULL (but last_autoanalyze shouldn't be). Commented Mar 18, 2019 at 11:22

1 Answer 1

1

This particular query could be sped up with the following index:

CREATE INDEX ON public.objects_object (
   ((data #>> ARRAY['tagline'::text, 'id'::text])::double precision),
   collection_id
) WHERE (data -> 'tagline') ? 'id';

This will avoid the filter in the index scan, which is where most of the time is spent.

Sign up to request clarification or add additional context in comments.

6 Comments

One thing i can't understand, why EXPLAIN ANALYZE .. was returning different execution times on repetitive run?
Because more data was cached. See the line Buffers: shared hit=4480 read=6261 in your execution plan.
Actually i was too optimistic, execution time is still pretty high(see edit).
It is mysterious why the index is not used instead of the filter condition. You ran ANALYZE after you created the extra index, right?
I ran analyze, but it seems like the order on which the fields are specified in the index and in the query matters, check the update i put.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.