Postgres Multiple Column Text Indexing

Question

If I have a table with millions of records where I am performing a wildcard search, should I be indexing on all columns?

Example:

table.account_id = X 
AND (table.col1 LIKE '%abc%' 
     OR table.col2 LIKE %abc% 
     OR LOWER(table.col3) LIKE %abc% 
     OR LOWER(table.col4) LIKE %abc%)

That is returning in about 1-2s depending on what I am searching. However, I need this to be < .5s since it is an autocomplete field.

My question is if I should be creating an index such as:

CREATE INDEX ix_xxxx ON table (col1, col2,lower(col3),lower(col4), owner_firm_id)

or should I be looking at to_tsvector or text_pattern_ops in the index?

I did try something like the below but it didn't get it to < 1s.

CREATE INDEX ix_name ON table
            USING gist (col1 gist_trgm_ops, col2 gist_trgm_ops, LOWER(col3) gist_trgm_ops, col4 gist_trgm_ops, account_id);

This observation may (or may not) help you but I'd recommend having a look at ElasticSearch. It may be that you can't alter the architecture of your system, or the project is too far down the road to think of such a change but if you have some freedom to change things I'd suggest a rethink. Your db could be your "one source of truth" from which you build your ES index. ES's text searching performance will beat PG's. — Adam Benson
– Adam Benson, Commented Oct 30, 2020 at 15:22

Laurenz Albe · Accepted Answer · 2020-10-30 07:22:59Z

3

If you perform a substring search, use a trigram index:

CREATE EXTENSION pg_trgm;

CREATE INDEX ON atable USING gin ((col1 || ' ' || col2 ... gin_trgm_ops);

Then query like

WHERE col1 || ' ' || col2 || ... ILIKE '%abcd%'

For decent performance, don't start searching until the string has reached a length of 4 or so.

edited Oct 30, 2020 at 7:22

answered Oct 30, 2020 at 7:04

Laurenz Albe

257k22 gold badges312 silver badges388 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

jjanes · Accepted Answer · 2020-10-30 15:54:05Z

Your GiST index is not treating col4 the same way as the query treats it. Also, GiST indexes are sensitive to column order, and the column which can eliminate the most rows with the least work should go first. Presumably this would be account_id:

CREATE INDEX ix_name ON table
   USING gist (account_id, col1 gist_trgm_ops, col2 gist_trgm_ops, LOWER(col3) gist_trgm_ops, lower(col4) gist_trgm_ops);

The same index only in GIN rather than GiST form might be effective, you can try both and see. That one would not be sensitive to the order the columns are specified in the index.

The pg_trgm indexes operators support LIKE and ILIKE equally well. Rather than doing lower on two of the columns, you could just query with ILIKE rather than LIKE on the original columns and build the index on the original columns. This probably won't change the performance much, but the simpler indexes will be more flexible.

But is it really necessary to having the leading '%' in your queries? Do people expect autocomplete to work when they start typing from the middle?

Collectives™ on Stack Overflow

Postgres Multiple Column Text Indexing

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related