I am looking for an idea to optimize my query.
Currently, I have a table of 4M lines, I only want to retrieve the last 1000 lines of a reference:
SELECT *
FROM customers_material_events
WHERE reference = 'XXXXXX'
ORDER BY date DESC
LIMIT 1000;
This is the execution plan:
Limit (cost=12512155.48..12512272.15 rows=1000 width=6807) (actual time=8953.545..9013.658 rows=1000 loops=1)
Buffers: shared hit=16153 read=30342
-> Gather Merge (cost=12512155.48..12840015.90 rows=2810036 width=6807) (actual time=8953.543..9013.613 rows=1000 loops=1)
Workers Planned: 2
Workers Launched: 2
Buffers: shared hit=16153 read=30342
-> Sort (cost=12511155.46..12514668.00 rows=1405018 width=6807) (actual time=8865.186..8865.208 rows=632 loops=3)
Sort Key: date DESC
Sort Method: top-N heapsort Memory: 330kB
Worker 0: Sort Method: top-N heapsort Memory: 328kB
Worker 1: Sort Method: top-N heapsort Memory: 330kB
Buffers: shared hit=16153 read=30342
-> Parallel Seq Scan on customers_material_events (cost=0.00..64165.96 rows=1405018 width=6807) (actual time=0.064..944.029 rows=1117807 loops=3)
Filter: ((reference)::text = 'FFFEEE'::text)
Rows Removed by Filter: 17188
Buffers: shared hit=16091 read=30342
Planning Time: 0.189 ms
Execution Time: 9013.834 ms
(18 rows)
I see the execution time is very very slow...
referenceif the data is suitable, which it seems to beORDER BY a_column DESC LIMIT Nquite often benefits from an index ona_column. As mentioned above, you can also add an indexreferenceand(reference, date)or(date, reference). Just experiment with adding an index, doingANALYZE customers_material_eventsand measure the speed -- one of these indexes can also speed up the query, but if and how much really depends on the selectivity of both columns.