Postgres query slow with index only scan

Question

I have a table call_logs and it contains an id, device_id, timestamp variable along with some other fields. I am currently trying to write a query that returns the last call, if it is working for each device. Currently my query is this:

SELECT DISTINCT ON (device_id) c.device_id, c.timestamp, c.working, c.id
FROM call_logs c
ORDER BY c.device_id, c.timestamp desc;

and it returns the information I want. But my production server is now getting rather large and I have around 6,000,000 records in the table.

I have added an index to this table:

CREATE INDEX cl_device_timestamp
ON public.call_logs USING btree
(device_id, timestamp DESC, id, working)
TABLESPACE pg_default;

But I am getting what I consider to be very slow time: Here is an explain analyse f the query:

EXPLAIN ANALYSE SELECT DISTINCT ON (device_id) c.device_id, c.timestamp, c.working, c.id
                                                     FROM call_logs c
                                                      ORDER BY c.device_id, c.timestamp desc;
    QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------
 Unique  (cost=0.56..363803.37 rows=120 width=25) (actual time=0.069..2171.201 rows=124 loops=1)
   ->  Index Only Scan using cl_device_timestamp on call_logs c  (cost=0.56..347982.87 rows=6328197 width=25) (actual time=0.067..1594.953 rows=6331024 loops=1)
         Heap Fetches: 8051
 Planning time: 0.184 ms
 Execution time: 2171.281 ms
(5 rows)

I only have 124 unique device_id. I would not have thought this would be a slow process with the index? Any ideas what is going wrong? Or why it is so slow?

How's the execution time if you remove the DISTINCT? If you just want the last call, can't you add LIMIT 1 and make the DISTINCT unnecessary? — Andy Carlson
– Andy Carlson, Commented Aug 9, 2017 at 3:28
try to avoid distinct, see this: dba.stackexchange.com/questions/93158/… — Tisp
– Tisp, Commented Aug 9, 2017 at 3:33

user1434177 · Accepted Answer · 2017-08-10 07:25:14Z

I ended up doing this:

SELECT DISTINCT d.id, c.timestamp, c.id, c.working
FROM devices d
INNER JOIN call_logs c on d.id = c.device_id AND c.timestamp = (SELECT max(t.timestamp) FROM call_logs t WHERE t.device_id = d.id)

and it ended up being alot better

Unique  (cost=607.92..608.06 rows=11 width=25) (actual time=3.291..3.344 rows=117 loops=1)
   ->  Sort  (cost=607.92..607.95 rows=11 width=25) (actual time=3.289..3.310 rows=117 loops=1)
         Sort Key: d.id, c."timestamp", c.id, c.working
         Sort Method: quicksort  Memory: 34kB
         ->  Nested Loop  (cost=1.05..607.73 rows=11 width=25) (actual time=0.057..3.162 rows=117 loops=1)
               ->  Seq Scan on devices d  (cost=0.00..4.18 rows=118 width=8) (actual time=0.006..0.029 rows=119 loops=1)
               ->  Index Only Scan using cl_device_timestamp on call_logs c  (cost=1.05..5.10 rows=1 width=25) (actual time=0.007..0.007 rows=1 loops=119)
                     Index Cond: ((device_id = d.id) AND ("timestamp" = (SubPlan 2)))
                     Heap Fetches: 110
                     SubPlan 2
                       ->  Result  (cost=0.48..0.49 rows=1 width=8) (actual time=0.018..0.018 rows=1 loops=119)
                             InitPlan 1 (returns $1)
                               ->  Limit  (cost=0.43..0.48 rows=1 width=8) (actual time=0.017..0.017 rows=1 loops=119)
                                     ->  Index Only Scan Backward using test1 on call_logs t  (cost=0.43..2674.01 rows=52483 width=8) (actual time=0.017..0.017 rows=1 loops=119)
                                           Index Cond: ((device_id = d.id) AND ("timestamp" IS NOT NULL))
                                           Heap Fetches: 110
 Planning time: 0.645 ms
 Execution time: 3.461 ms
(18 rows)

Vao Tsun · Accepted Answer · 2017-08-09 09:27:33Z

your index is on 4 columns, not one. You cant estimate the size and efficiency on composite index based on one out of four columns data distribution.

Next - the fact that you have only 124 distinct devices does not mean faster index. Opposite - the less distinctive values split tree to less parts, and thus parts are bigger. Eg bigserial on one million bigint values has a million distinct values and the exact id is got very fast. While boolean column index scan has only two(three) values and thus takes much longer.

Last argument - two seconds is very slow, indeed. But considering the fact you scan 6 million rows, comparing the timestamp, 2 seconds become quite acceptable I'd say.

You can sacrifice OLTP speed and create some trigger that will save last timestamp on data change for each device, etc pairs to some external table. Then selecting such pre-aggregated values from short external table will take microseconds for 127 devices.

Collectives™ on Stack Overflow

Postgres query slow with index only scan

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related