Postgres Index scan performing poorly

Question

I have a single small table(~400k rows), the table is indexed by collection_id and contains a JSON column with several GIN indexes defined, one of them is on the value tagline.id.

The query to get all the objects with a specific tagline.id sometimes is VERY slow:

explain (analyze, buffers)
SELECT "objects_object"."created",
       "objects_object"."modified",
       "objects_object"."_id",
       "objects_object"."id",
       "objects_object"."collection_id",
       "objects_object"."data",
       "objects_object"."search",
       "objects_object"."location"::bytea
FROM "objects_object"
WHERE ("objects_object"."collection_id" IN (3381, 3321, 3312, 3262, 3068, 2684, 2508, 2159, 2158, 2154, 2157, 2156)
  AND (("objects_object"."data" #>> ARRAY['tagline','id']))::float IN ('8')
  AND ("objects_object"."data" -> 'tagline') ? 'id')
ORDER BY "objects_object"."created" DESC,
         "objects_object"."id" ASC
LIMIT 101;                                                                         


    QUERY PLAN                                                                               
------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=8.46..8.47 rows=1 width=1239) (actual time=5513.374..5513.399 rows=101 loops=1)
   Buffers: shared hit=4480 read=6261
   ->  Sort  (cost=8.46..8.47 rows=1 width=1239) (actual time=5513.372..5513.389 rows=101 loops=1)
         Sort Key: created DESC, id
         Sort Method: top-N heapsort  Memory: 247kB
         Buffers: shared hit=4480 read=6261
         ->  Index Scan using index_tagline_id_float_51a27976 on objects_object  (cost=0.42..8.45 rows=1 width=1239) (actual time=943.689..5513.002 rows=235 loops=1)
               Index Cond: (((data #>> '{tagline,id}'::text[]))::double precision = '8'::double precision)
               Filter: (collection_id = ANY ('{3381,3321,3312,3262,3068,2684,2508,2159,2158,2154,2157,2156}'::integer[]))
               Rows Removed by Filter: 47295
               Buffers: shared hit=4480 read=6261
 Planning time: 0.244 ms
 Execution time: 5513.439 ms
(13 rows)

If executed multiple times the execution time drops to ~ 5 ms.

What is taking so long? Why after the first time the execution time drops that much?

I don't think it's memory related since the default memory(4MB) is much higher than the required(247Kb).

EDIT: Index definitions:

SELECT indexdef FROM pg_indexes 
WHERE indexname = 'index_tagline_id_float_51a27976'; 
                                                                                                 indexdef                                                                                                 
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 CREATE INDEX index_tagline_id_float_51a27976 ON public.objects_object USING btree ((((data #>> ARRAY['tagline'::text, 'id'::text]))::double precision)) WHERE ((data -> 'tagline'::text) ? 'id'::text)
(1 row)

SELECT indexdef FROM pg_indexes 
WHERE indexname = 'objects_object_collection_id_6f1559f5'; 
                                                indexdef                                                 
---------------------------------------------------------------------------------------------------------
 CREATE INDEX objects_object_collection_id_6f1559f5 ON public.objects_object USING btree (collection_id)
(1 row)

EDIT:

After adding the index test:

select indexdef from pg_indexes where indexname='test'; 
                                                                                          indexdef                                                                                          
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 CREATE INDEX test ON public.objects_object USING btree ((((data #>> ARRAY['tagline'::text, 'id'::text]))::double precision), collection_id) WHERE ((data -> 'tagline'::text) ? 'id'::text)
(1 row)

The execution time decreased, but so the buffer shared hit, not sure this improved the performance then:

                                                                 QUERY PLAN                                                                 
--------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=8.46..8.47 rows=1 width=1238) (actual time=1721.260..1721.281 rows=101 loops=1)
   Buffers: shared hit=5460 read=5115
   ->  Sort  (cost=8.46..8.47 rows=1 width=1238) (actual time=1721.257..1721.270 rows=101 loops=1)
         Sort Key: created DESC, id
         Sort Method: top-N heapsort  Memory: 298kB
         Buffers: shared hit=5460 read=5115
         ->  Index Scan using test on objects_object  (cost=0.42..8.45 rows=1 width=1238) (actual time=1682.637..1720.793 rows=235 loops=1)
               Index Cond: (((data #>> '{tagline,id}'::text[]))::double precision = '8'::double precision)
               Filter: (collection_id = ANY ('{3381,3321,3312,3262,3068,2684,2508,2159,2158,2154,2157,2156}'::integer[]))
               Rows Removed by Filter: 47295
               Buffers: shared hit=5454 read=5115
 Planning time: 238.364 ms
 Execution time: 1762.996 ms
(13 rows)

The problem seems to be that collection_id should be part of the index condition, not on the filtering, this would avoid getting from the (slow) data storage a large amount of data.

Why is the index not working as expected?

UPDATE: Apparently the order of the parameters impacted on the query plan, i rewrote the index as:

select indexdef from pg_indexes where indexname='test'; 
                                                                                          indexdef                                                                                          
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 CREATE INDEX test ON public.objects_object USING btree (collection_id, (((data #>> ARRAY['tagline'::text, 'id'::text]))::double precision)) WHERE ((data -> 'tagline'::text) ? 'id'::text)

Running now the query we can see the lower number of read records:

 Limit  (cost=57.15..57.16 rows=1 width=1177) (actual time=1.043..1.059 rows=101 loops=1)
   Buffers: shared hit=101 read=10
   ->  Sort  (cost=57.15..57.16 rows=1 width=1177) (actual time=1.040..1.047 rows=101 loops=1)
         Sort Key: created DESC, id
         Sort Method: top-N heapsort  Memory: 304kB
         Buffers: shared hit=101 read=10
         ->  Index Scan using test on objects_object  (cost=0.42..57.14 rows=1 width=1177) (actual time=0.094..0.670 rows=232 loops=1)
               Index Cond: ((collection_id = ANY ('{3381,3321,3312,3262,3068,2684,2508,2159,2158,2154,2157,2156}'::integer[])) AND (((data #>> '{tagline,id}'::text[]))::double precision = '8'::double precisio
n))
               Buffers: shared hit=95 read=10
 Planning time: 416.365 ms
 Execution time: 43.463 ms
(11 rows)

What is the definition of the existing (partial?) index? Do you have an index on collection_id? — Laurenz Albe
– Laurenz Albe, Commented Mar 18, 2019 at 10:37
Yes, there is an index on collection_id, it's not being used on this query though as far as i can see. — Riccardo
– Riccardo, Commented Mar 18, 2019 at 10:43
Please add the index definition to the question. It is normal that last_analyze is NULL (but last_autoanalyze shouldn't be). — Laurenz Albe
– Laurenz Albe, Commented Mar 18, 2019 at 11:22

Laurenz Albe · Accepted Answer · 2019-03-18 11:41:36Z

1

This particular query could be sped up with the following index:

CREATE INDEX ON public.objects_object (
   ((data #>> ARRAY['tagline'::text, 'id'::text])::double precision),
   collection_id
) WHERE (data -> 'tagline') ? 'id';

This will avoid the filter in the index scan, which is where most of the time is spent.

answered Mar 18, 2019 at 11:41

Laurenz Albe

257k22 gold badges312 silver badges388 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

Riccardo Over a year ago

One thing i can't understand, why EXPLAIN ANALYZE .. was returning different execution times on repetitive run?

Laurenz Albe Over a year ago

Because more data was cached. See the line Buffers: shared hit=4480 read=6261 in your execution plan.

Riccardo Over a year ago

Actually i was too optimistic, execution time is still pretty high(see edit).

Laurenz Albe Over a year ago

It is mysterious why the index is not used instead of the filter condition. You ran ANALYZE after you created the extra index, right?

Riccardo Over a year ago

I ran analyze, but it seems like the order on which the fields are specified in the index and in the query matters, check the update i put.

|

Collectives™ on Stack Overflow

Postgres Index scan performing poorly

1 Answer 1

6 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

6 Comments

Your Answer

Sign up or log in

Post as a guest

Related