3

I have the following table:

CREATE TABLE products (
  id   bigserial NOT NULL PRIMARY KEY,
  name varchar(2048)
  -- Many other rows
);

I want to make a case and diacritics insensitive LIKE query on name.

For that I have created the following function :

CREATE EXTENSION IF NOT EXISTS unaccent;
CREATE OR REPLACE FUNCTION immutable_unaccent(varchar)
  RETURNS text AS $$
    SELECT unaccent($1)
  $$ LANGUAGE sql IMMUTABLE;

And then created an index on name using this function:

CREATE INDEX products_search_name_key ON products(immutable_unaccent(name));

However, when I make a query, the query is very slow (about 2.5s for 300k rows). I'm pretty sure PostgreSQL is not using the index

-- Slow (~2.5s for 300k rows)
SELECT products.* FROM products
    WHERE immutable_unaccent(products.name) LIKE immutable_unaccent('%Hello world%')

-- Fast (~60ms for 300k rows), and there is no index
SELECT products.* FROM products
    WHERE products.name LIKE '%Hello world%'

I have tried creating a separate column with a case and diacritics insensitive copy of the name like so, and in that case the query is fast:

ALTER TABLE products ADD search_name varchar(2048);
UPDATE products
    SET search_name = immutable_unaccent(name);

-- Fast (~60ms for 300k rows), and there is no index
SELECT products.* FROM products
    WHERE products.search_name LIKE immutable_unaccent('%Hello world%')

What am I doing wrong ? Why doesn't my index approach work ?

Edit: Execution plan for the slow query

explain analyze SELECT products.* FROM products
    WHERE immutable_unaccent(products.name) LIKE immutable_unaccent('%Hello world%')

Seq Scan on products  (cost=0.00..79568.32 rows=28 width=2020) (actual time=1896.131..1896.131 rows=0 loops=1)
  Filter: (immutable_unaccent(name) ~~ '%Hello world%'::text)
  Rows Removed by Filter: 277986
Planning time: 1.014 ms
Execution time: 1896.220 ms
5
  • 1
    add execution plan - explain analyze SELECT products.* FROM products WHERE immutable_unaccent(products.name) LIKE immutable_unaccent('%Hello world%') Commented Apr 8, 2017 at 8:42
  • @ClodoaldoNeto Now that you say it, it makes sense I guess. I was hoping it would use the index because it should already contain the immutable_unaccent computed value. Is using a copy column the only way then ? Commented Apr 8, 2017 at 9:05
  • 1
    Even if you copy the column unaccented, you're still not going to be able to use an index with LIKE patterns starting with %. Postgres has pretty good full text search functionality built in however, maybe you should have a look at that. postgresql.org/docs/current/static/textsearch-intro.html Commented Apr 8, 2017 at 13:55
  • @Ede Thanks. I understand my queries still don't use an index, but a least the unaccent function is already precomputed which is much faster. I will take a look at full text search and see if it can solve my problem better. Commented Apr 9, 2017 at 8:17
  • @ClodoaldoNeto: I wouldn't say never.. it will if you use a gist/gin trigram index (see my answer below) Commented Apr 17, 2017 at 14:36

1 Answer 1

2

If you're wanting to do a like '%hello world%' type query, you must find another way to index it.

(you may have to do some initial installation of a couple of contrib modules. To do so, login as the postgres admin/root user and issue the following commands)

Prerequisite:

CREATE EXTENSION pg_trgm;
CREATE EXTENSION fuzzystrmatch;

Try the following:

create index on products using gist (immutable_unaccent(name) gist_trgm_ops);

It should use an index with your query at that point.

select * from product 
where immutable_unaccent(name) like '%Hello world%';

Note: this index could get large, but with 240 character limit, probably wont get that big.

You could also use full text search, but that's a lot more complicated.

What the above scenario does is index "trigrams" of the name, IE, each set of "3 letters" within the name. So it the product is called "hello world" it would index hel,ell,llo ,lo , wo, wor, orl, and rld. Then it can use that index against your search term in a more efficient way. You can use either a gist or a gin index type if you like.

Basically GIST will be slightly slower to query, but faster to update. GIN is the opposite>

Sign up to request clarification or add additional context in comments.

2 Comments

Thanks, I will try to take a look at that option
Let me know how it works. I added instructions on installing the prerequisite contrib modiules that come with PG, so please review the new instructions if you had any trouble with the prior instructions.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.