0

I am looking for an idea to optimize my query.

Currently, I have a table of 4M lines, I only want to retrieve the last 1000 lines of a reference:

SELECT * 
FROM customers_material_events 
WHERE reference = 'XXXXXX' 
ORDER BY date DESC 
LIMIT 1000;

This is the execution plan:

Limit  (cost=12512155.48..12512272.15 rows=1000 width=6807) (actual time=8953.545..9013.658 rows=1000 loops=1)
   Buffers: shared hit=16153 read=30342
   ->  Gather Merge  (cost=12512155.48..12840015.90 rows=2810036 width=6807) (actual time=8953.543..9013.613 rows=1000 loops=1)
         Workers Planned: 2
         Workers Launched: 2
         Buffers: shared hit=16153 read=30342
         ->  Sort  (cost=12511155.46..12514668.00 rows=1405018 width=6807) (actual time=8865.186..8865.208 rows=632 loops=3)
               Sort Key: date DESC
               Sort Method: top-N heapsort  Memory: 330kB
               Worker 0:  Sort Method: top-N heapsort  Memory: 328kB
               Worker 1:  Sort Method: top-N heapsort  Memory: 330kB
               Buffers: shared hit=16153 read=30342
               ->  Parallel Seq Scan on customers_material_events  (cost=0.00..64165.96 rows=1405018 width=6807) (actual time=0.064..944.029 rows=1117807 loops=3)
                     Filter: ((reference)::text = 'FFFEEE'::text)
                     Rows Removed by Filter: 17188
                     Buffers: shared hit=16091 read=30342
 Planning Time: 0.189 ms
 Execution Time: 9013.834 ms
(18 rows)

I see the execution time is very very slow...

8
  • Does the table have indexes? Commented Feb 26, 2019 at 10:33
  • Only 'id' is primary key Commented Feb 26, 2019 at 10:34
  • Then you’ll probably benefit from adding an index on reference if the data is suitable, which it seems to be Commented Feb 26, 2019 at 10:35
  • 1
    Ideally the index should be a multicolumn (reference, date) one to search and sort on. PostgreSQL would still need to access the table data for the other column data Commented Feb 26, 2019 at 10:38
  • ORDER BY a_column DESC LIMIT N quite often benefits from an index on a_column. As mentioned above, you can also add an index reference and (reference, date) or (date, reference). Just experiment with adding an index, doing ANALYZE customers_material_events and measure the speed -- one of these indexes can also speed up the query, but if and how much really depends on the selectivity of both columns. Commented Feb 26, 2019 at 11:01

1 Answer 1

2

The ideal index for this query would be:

CREATE INDEX ON customers_material_events (reference, date);

That would allow you to quickly find the values for a certain reference, automatically ordered by date, so no extra sort step is necessary.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.