How to optimize my PostgreSQL DB for prefix search?

Question

I have a table called "nodes" with roughly 1.7 million rows in my PostgreSQL db

=#\d nodes
            Table "public.nodes"
 Column |          Type          | Modifiers 
--------+------------------------+-----------
 id     | integer                | not null
 title  | character varying(256) | 
 score  | double precision       | 
Indexes:
    "nodes_pkey" PRIMARY KEY, btree (id)

I want to use information from that table for autocompletion of a search field, showing the user a list of the ten titles having the highest score fitting to his input. So I used this query (here searching for all titles starting with "s")

=# explain analyze select title,score from nodes where title ilike 's%' order by score desc; 
                                                      QUERY PLAN                                                       
-----------------------------------------------------------------------------------------------------------------------
 Sort  (cost=64177.92..64581.38 rows=161385 width=25) (actual time=4930.334..5047.321 rows=161264 loops=1)
   Sort Key: score
   Sort Method:  external merge  Disk: 5712kB
   ->  Seq Scan on nodes  (cost=0.00..46630.50 rows=161385 width=25) (actual time=0.611..4464.413 rows=161264 loops=1)
         Filter: ((title)::text ~~* 's%'::text)
 Total runtime: 5260.791 ms
(6 rows)

This was much to slow for using it with autocomplete. With some information from Using PostgreSQL in Web 2.0 Applications I was able to improve that with a special index

=# create index title_idx on nodes using btree(lower(title) text_pattern_ops);
=# explain analyze select title,score from nodes where lower(title) like lower('s%') order by score desc limit 10;
                                                                QUERY PLAN                                                                
------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=18122.41..18122.43 rows=10 width=25) (actual time=1324.703..1324.708 rows=10 loops=1)
   ->  Sort  (cost=18122.41..18144.60 rows=8876 width=25) (actual time=1324.700..1324.702 rows=10 loops=1)
         Sort Key: score
         Sort Method:  top-N heapsort  Memory: 17kB
         ->  Bitmap Heap Scan on nodes  (cost=243.53..17930.60 rows=8876 width=25) (actual time=96.124..1227.203 rows=161264 loops=1)
               Filter: (lower((title)::text) ~~ 's%'::text)
               ->  Bitmap Index Scan on title_idx  (cost=0.00..241.31 rows=8876 width=0) (actual time=90.059..90.059 rows=161264 loops=1)
                     Index Cond: ((lower((title)::text) ~>=~ 's'::text) AND (lower((title)::text) ~<~ 't'::text))
 Total runtime: 1325.085 ms
(9 rows)

So this gave me a speedup of factor 4. But can this be further improved? What if I want to use '%s%' instead of 's%'? Do I have any chance of getting a decent performance with PostgreSQL in that case, too? Or should I better try a different solution (Lucene?, Sphinx?) for implementing my autocomplete feature?

Craig Ringer · Accepted Answer · 2013-08-14 02:06:06Z

7

You will need a text_pattern_ops index if you're not in the C locale.

See: index types.

answered Aug 14, 2013 at 2:06

Craig Ringer

329k83 gold badges742 silver badges820 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Ciro Santilli OurBigBook.com Nov 27, 2024 at 13:16

text_pattern_opts is currently mentioned on this page btw: postgresql.org/docs/17/indexes-opclass.html

Peter Tillemans · Accepted Answer · 2010-06-15 05:55:32Z

Tips for further investigation :

Partition the table on the title key. This makes the lists smaller that postgres need to work with.
give postgresql more memory so the cache hit rate > 98%. This table will take about 0.5G, I think 2G should be no problem nowadays. Make sure statistics collection is enabled and read up on the pg_stats tables.
Make a second table with a reduced sustring of the title e.g. 12 characters so the complete table fits in less database blocks. An index on a substring may also work, but requires careful querying.
The long the substring, the faster the query will run. Create a separate table for small substrings, and store in the value the top ten or whatever of choices you would want to show. There are about 20000 combinations of 1,2,3 character strings.
You can use the same idea if you want to have %abc% queries, but probably switching to lucene makes sense now.

Tometzky · Accepted Answer · 2010-06-15 17:15:16Z

0

You're obviously not interested in 150000+ results, so you should limit them:

select title,score
  from nodes
  where title ilike 's%'
  order by score desc
  limit 10;

You can also consider creating functional index, and using ">=" and "<":

create index nodes_title_lower_idx on nodes (lower(title));
select title,score
  from nodes
  where lower(title)>='s' and lower(title)<'t'
  order by score desc
  limit 10;

You should also create index on score, which will help in ilike %s% case.

answered Jun 15, 2010 at 17:15

Tometzky

24.2k5 gold badges64 silver badges79 bronze badges

Comments

Ciro Santilli OurBigBook.com · Accepted Answer · 2024-12-06 21:20:34Z

Simultaneous prefix + suffix LIKE '%abc%' can be sped up with gin + pg_trgm

Usage:

CREATE EXTENSION IF NOT EXISTS pg_trgm;
CREATE TABLE "mytable" ("col1" TEXT);
CREATE INDEX "mytable_col1_gin" ON "mytable" USING gin("col1" gin_trgm_ops);
EXPLAIN SELECT * FROM "mytable" WHERE "col1" LIKE '%abc%';

which produces:

                                QUERY PLAN                                 
---------------------------------------------------------------------------
 Bitmap Heap Scan on mytable  (cost=15.10..104.10 rows=400 width=72)
   Recheck Cond: (col1 ~~ '%abc%'::text)
   ->  Bitmap Index Scan on mytable_col1_gin  (cost=0.00..15.00 rows=400 width=0)
         Index Cond: (col1 ~~ '%abc%'::text)

which means that the query was sped up.

Notes:

dedicated question: Postgres Select ILIKE %text% is Slow On Large String Rows
PostgreSQL only uses the index by default if the search term abc has three or more indices (thus trigram): I want to search for 2 characters, is there any solution? (trigram index only works for minimum 3 characters)
ah, it seems to only use the index if there are exactly 3 characters, e.g. with four as in:
```
EXPLAIN SELECT * FROM "mytable" WHERE "col1" LIKE '%abcd%';
```
the index was not used. Hmm, so maybe it's not that useful after all? The decision to use or not also seems to depend on how many rows you have: Why PostgreSQL doesn't use trigram index But I did a test with 1 billion rows and it still wasn't used.
similar question: PostgreSQL index for like 'abc%' searching

A fun way to test this out is to generate a massive test database e.g. with:

and then see if our queries are actually faster. I recommend at least 10 GB of text, otherwise the indexing speedup is not very clear on my system.

Tested on Ubuntu 24.04, PostgreSQL 16.6.

Collectives™ on Stack Overflow

How to optimize my PostgreSQL DB for prefix search?

4 Answers 4

1 Comment

Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

1 Comment

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related