I have a simple Postgres Table. A simple query to count total records takes ages. I have 7.5 millions records in table, I using 8 vCPUs, 32 GB memory machine. Database is in same machine.
Edit: add query.
Following query is very slow:
SELECT * FROM import_csv WHERE processed = False ORDER BY id ASC OFFSET 1 LIMIT 10000
Output of explain
$ explain SELECT * FROM import_csv WHERE processed = False ORDER BY id ASC OFFSET 1 LIMIT 10000
---------------------------------------------------------------------------------------------------------
Limit (cost=5.42..49915.17 rows=10000 width=1985)
-> Index Scan using import_csv_id_idx on import_csv (cost=0.43..19144730.02 rows=3835870 width=1985)
Filter: (NOT processed)
(3 rows)
My table is as below:
Column | Type | Collation | Nullable | Default
-------------------+----------------+-----------+----------+---------
id | integer | | |
name | character(500) | | |
domain | character(500) | | |
year_founded | real | | |
industry | character(500) | | |
size_range | character(500) | | |
locality | character(500) | | |
country | character(500) | | |
linkedinurl | character(500) | | |
employees | integer | | |
processed | boolean | | not null | false
employee_estimate | integer | | |
Indexes:
"import_csv_id_idx" btree (id)
"processed_idx" btree (processed)
Thank you
Edit 3:
# explain analyze SELECT * FROM import_csv WHERE processed = False ORDER BY id ASC OFFSET 1 LIMIT 10000;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=5.42..49915.33 rows=10000 width=1985) (actual time=8331.070..8355.556 rows=10000 loops=1)
-> Index Scan using import_csv_id_idx on import_csv (cost=0.43..19144790.06 rows=3835870 width=1985) (actual time=8331.067..8354.874 rows=10001 loops=1)
Filter: (NOT processed)
Rows Removed by Filter: 3482252
Planning time: 0.081 ms
Execution time: 8355.925 ms
(6 rows)
explain (analyze, buffers)
# explain (analyze, buffers) SELECT * FROM import_csv WHERE processed = False ORDER BY id ASC OFFSET 1 LIMIT 10000;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=5.42..49915.33 rows=10000 width=1985) (actual time=8236.899..8260.941 rows=10000 loops=1)
Buffers: shared hit=724036 read=2187905 dirtied=17 written=35
-> Index Scan using import_csv_id_idx on import_csv (cost=0.43..19144790.06 rows=3835870 width=1985) (actual time=8236.896..8260.104 rows=10001 loops=1)
Filter: (NOT processed)
Rows Removed by Filter: 3482252
Buffers: shared hit=724036 read=2187905 dirtied=17 written=35
Planning time: 0.386 ms
Execution time: 8261.406 ms
(8 rows)
explain (analyze, buffers, format text)not_ just a "simple" explainexplain (analyze, buffers)that will contain more information that a "simple" explainexplain, not the output ofexplain (analyze, buffers)