Postgres LIKE '...%' doesn't use index

Question

I have a table in which I want to search by a prefix of the primary key. The primary key has values like 03.000221.1, 03.000221.2, 03.000221.3, etc. and I want to retrieve all that begin with 03.000221..

My first thought was to filter with LIKE '03.000221.%', thinking Postgres would be smart enough to look up 03.000221. in the index and perform a range scan from that point. But no, this performs a sequential scan.

                                                   QUERY PLAN                                                    
-----------------------------------------------------------------------------------------------------------------
 Gather  (cost=1000.00..253626.34 rows=78 width=669)
   Workers Planned: 2
   ->  Parallel Seq Scan on ...  (cost=0.00..252618.54 rows=32 width=669)
         Filter: ((id ~~ '03.000221.%'::text)
 JIT:
   Functions: 2
   Options: Inlining false, Optimization false, Expressions true, Deforming true

If I do an equivalent operation using a plain >= and < range, e. g. id >= '03.000221.' and id < '03.000221.Z' it does use the index:

                                                                 QUERY PLAN                                                                  
---------------------------------------------------------------------------------------------------------------------------------------------
 Index Scan using ... on ...  (cost=0.56..8.58 rows=1 width=669)
   Index Cond: ((id >= '03.000221.'::text) AND (id < '03.000221.Z'::text))

But this is dirtier and it seems to me that Postgres should be able to deduce it can do an equivalent index range lookup with LIKE. Why doesn't it?

jjanes · Accepted Answer · 2020-04-26 00:17:40Z

8

PostgreSQL will do this if you are build the index with text_pattern_ops operator, or if you are using the C collation.

If you are using some random other collation, PostgreSQL can't deduce much of anything about it. Observe this, in the very common "en_US.utf8" collation.

select * from (values ('03.000221.1'), ('03.0002212'), ('03.000221.3')) f(x) order by x;
      x      
-------------
 03.000221.1
 03.0002212
 03.000221.3

Which then naturally leads to this wrong answer with your query:

select * from (values ('03.000221.1'), ('03.0002212'), ('03.000221.3')) f(id)
    where ((id >= '03.000221.'::text) AND (id < '03.000221.Z'::text))
     id      
-------------
 03.000221.1
 03.0002212
 03.000221.3

edited Apr 26, 2020 at 0:17

answered Apr 25, 2020 at 12:16

jjanes

44.9k5 gold badges39 silver badges48 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

user330315 Over a year ago

For the given values, collate "C" is probably the best choice

Toni Cárdenas Over a year ago

I'm using C.UTF-8, which apparently isn't C enough. Thanks!

jjanes Over a year ago

@ToniCárdenas I've never understood the difference between C and C.UTF-8. I think maybe C is implemented internally as a special case, while C.UTF-8 is outsourced to glibc. It probably could use the index over C.UTF-8 and get the right answer, it just doesn't know that it could.

Collectives™ on Stack Overflow

Postgres LIKE '...%' doesn't use index

1 Answer 1

3 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related