Why is select query is very slow in Postgres?

Question

I have a simple Postgres Table. A simple query to count total records takes ages. I have 7.5 millions records in table, I using 8 vCPUs, 32 GB memory machine. Database is in same machine.

Edit: add query.

Following query is very slow:

SELECT * FROM import_csv WHERE processed = False ORDER BY id ASC OFFSET 1 LIMIT 10000

Output of explain

$ explain SELECT * FROM import_csv WHERE processed = False ORDER BY id ASC OFFSET 1 LIMIT 10000

---------------------------------------------------------------------------------------------------------
 Limit  (cost=5.42..49915.17 rows=10000 width=1985)
   ->  Index Scan using import_csv_id_idx on import_csv  (cost=0.43..19144730.02 rows=3835870 width=1985)
         Filter: (NOT processed)
(3 rows)

My table is as below:

      Column       |      Type      | Collation | Nullable | Default 
-------------------+----------------+-----------+----------+---------
 id                | integer        |           |          | 
 name              | character(500) |           |          | 
 domain            | character(500) |           |          | 
 year_founded      | real           |           |          | 
 industry          | character(500) |           |          | 
 size_range        | character(500) |           |          | 
 locality          | character(500) |           |          | 
 country           | character(500) |           |          | 
 linkedinurl       | character(500) |           |          | 
 employees         | integer        |           |          | 
 processed         | boolean        |           | not null | false
 employee_estimate | integer        |           |          | 
Indexes:
    "import_csv_id_idx" btree (id)
    "processed_idx" btree (processed)

Thank you

Edit 3:

# explain analyze SELECT * FROM import_csv WHERE processed = False ORDER BY id ASC OFFSET 1 LIMIT 10000;
                                                                          QUERY PLAN                                                                          
--------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=5.42..49915.33 rows=10000 width=1985) (actual time=8331.070..8355.556 rows=10000 loops=1)
   ->  Index Scan using import_csv_id_idx on import_csv  (cost=0.43..19144790.06 rows=3835870 width=1985) (actual time=8331.067..8354.874 rows=10001 loops=1)
         Filter: (NOT processed)
         Rows Removed by Filter: 3482252
 Planning time: 0.081 ms
 Execution time: 8355.925 ms
(6 rows)

explain (analyze, buffers)

# explain (analyze, buffers) SELECT * FROM import_csv WHERE processed = False ORDER BY id ASC OFFSET 1 LIMIT 10000;


                                                                          QUERY PLAN                                                                          
--------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=5.42..49915.33 rows=10000 width=1985) (actual time=8236.899..8260.941 rows=10000 loops=1)
   Buffers: shared hit=724036 read=2187905 dirtied=17 written=35
   ->  Index Scan using import_csv_id_idx on import_csv  (cost=0.43..19144790.06 rows=3835870 width=1985) (actual time=8236.896..8260.104 rows=10001 loops=1)
         Filter: (NOT processed)
         Rows Removed by Filter: 3482252
         Buffers: shared hit=724036 read=2187905 dirtied=17 written=35
 Planning time: 0.386 ms
 Execution time: 8261.406 ms
(8 rows)

Please edit your question and add the execution plan generated using explain (analyze, buffers, format text) not_ just a "simple" explain — user330315
– user330315, Commented Apr 23, 2020 at 9:57
ok, but query like this is also very slow SELECT * FROM import_csv WHERE processed = False ORDER BY id ASC OFFSET 1 LIMIT 10000 — Krishna Sunuwar
– Krishna Sunuwar, Commented Apr 23, 2020 at 10:28
sorry a_horse_with_no_name, I edited question, removed explain part actually. — Krishna Sunuwar
– Krishna Sunuwar, Commented Apr 23, 2020 at 10:31
The execution plan is important. Please add the one generated using explain (analyze, buffers) that will contain more information that a "simple" explain — user330315
– user330315, Commented Apr 23, 2020 at 10:37
But that's the output of a "simple" explain, not the output of explain (analyze, buffers) — user330315
– user330315, Commented Apr 23, 2020 at 14:03

jjanes · Accepted Answer · 2020-04-23 15:48:25Z

1

It is slow because it has to dig through 3482252 rows which fail the processed = False criterion before finding the 10001st on which passes, and apparently all those failing rows are scattered randomly about the table leading to a lot of slow IO.

You either need an index on (processed, id), or on (id) where processed = false

If you do the first of these, you can drop the index on processed alone, as it would no longer be independently useful (if it ever were to start with).

answered Apr 23, 2020 at 15:48

jjanes

44.9k5 gold badges39 silver badges48 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Krishna Sunuwar Over a year ago

Thank you, I wonder reason for (processed, id) to be indexed. I just want to know more

Collectives™ on Stack Overflow

Why is select query is very slow in Postgres?

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related