Slow postgres text column query

Question

I have a btree index on a text column which holds a status token - queries including this field run 100x slower than queries without it - what can I do to speed it up? It's rather low cardinality - I've tried hash, btree with text_pattern_ops, and partial indexes -- collation is utf8 . Queries without the status index run in about the same time...

db=# show lc_collate;
 lc_collate  
-------------
 en_US.UTF-8
(1 row)

db=# drop index job_status_idx;                                                                                                                                                  
DROP INDEX
db=# CREATE INDEX job_status_idx ON job(status text_pattern_ops);                                                                                                                
CREATE INDEX

db=# select status, count(*) from job group by 1;                                                                                                                                
   status    | count  
-------------+--------
 pending     | 365027
 booked      |  37515
 submitted   |  20783
 cancelled   | 191707
 negotiating |     30
 completed   | 241339
 canceled    |     56
(7 rows)

db=# explain analyze SELECT key_ FROM "job" WHERE active = true and start > '2014-06-15T19:23:23.691670'::timestamp and status = 'completed' ORDER BY start DESC OFFSET 450 LIMIT
 150;
                                                                    QUERY PLAN                                                                    
--------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=5723.07..7630.61 rows=150 width=51) (actual time=634.978..638.086 rows=150 loops=1)
   ->  Index Scan Backward using job_start_idx on job  (cost=0.42..524054.39 rows=41209 width=51) (actual time=625.776..637.023 rows=600 loops=1)
         Index Cond: (start > '2014-06-15 19:23:23.69167'::timestamp without time zone)
         Filter: (active AND (status = 'completed'::text))
         Rows Removed by Filter: 94866
 Total runtime: 638.358 ms
(6 rows)



db=# explain analyze SELECT key_ FROM "job" WHERE active = true and start > '2014-06-15T19:23:23.691670'::timestamp ORDER BY start DESC OFFSET 450 LIMIT 150;
                                                                  QUERY PLAN                                                                   
-----------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=1585.61..2114.01 rows=150 width=51) (actual time=4.620..6.333 rows=150 loops=1)
   ->  Index Scan Backward using job_start_idx on job  (cost=0.42..523679.58 rows=148661 width=51) (actual time=0.080..5.271 rows=600 loops=1)
         Index Cond: (start > '2014-06-15 19:23:23.69167'::timestamp without time zone)
         Filter: active
 Total runtime: 6.584 ms
(5 rows)

Gordon Linoff · Accepted Answer · 2014-07-16 01:21:30Z

1

This is your query:

SELECT key_
FROM "job"
WHERE active = true and
     start > '2014-06-15T19:23:23.691670'::timestamp and
     status = 'completed'
ORDER BY start DESC
OFFSET 450 LIMIT 150;

The index on status is not very selective. I would suggest a composite index:

CREATE INDEX job_status_idx ON job(status text_pattern_ops, active, start, key_)

This is a covering index and it should do a better job of matching the where clause.

answered Jul 16, 2014 at 1:21

Gordon Linoff

1.3m62 gold badges705 silver badges857 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Matt Billenstein Over a year ago

Okay, so this index works very well: CREATE INDEX job_start_status_idx ON job(start, status text_pattern_ops); The key being start + status has good selectivity... This runs in about 15ms now.

Gordon Linoff Over a year ago

That is interesting. The covering is the important part then. It is using start for both the where and order by (presumably), which must be a big time saver.

Zegarek Over a year ago

Small update: starting with version 11, since key_ column is not sorted/searched by but it's needed for a covering index, it can be moved out from the index column list to include list: CREATE INDEX job_status_idx ON job(status text_pattern_ops, active, start) include (key_);.

Collectives™ on Stack Overflow

Slow postgres text column query

1 Answer 1

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related