1

If I have a table with millions of records where I am performing a wildcard search, should I be indexing on all columns?

Example:

table.account_id = X 
AND (table.col1 LIKE '%abc%' 
     OR table.col2 LIKE %abc% 
     OR LOWER(table.col3) LIKE %abc% 
     OR LOWER(table.col4) LIKE %abc%)

That is returning in about 1-2s depending on what I am searching. However, I need this to be < .5s since it is an autocomplete field.

My question is if I should be creating an index such as:

CREATE INDEX ix_xxxx ON table (col1, col2,lower(col3),lower(col4), owner_firm_id)

or should I be looking at to_tsvector or text_pattern_ops in the index?

I did try something like the below but it didn't get it to < 1s.

CREATE INDEX ix_name ON table
            USING gist (col1 gist_trgm_ops, col2 gist_trgm_ops, LOWER(col3) gist_trgm_ops, col4 gist_trgm_ops, account_id);

1
  • This observation may (or may not) help you but I'd recommend having a look at ElasticSearch. It may be that you can't alter the architecture of your system, or the project is too far down the road to think of such a change but if you have some freedom to change things I'd suggest a rethink. Your db could be your "one source of truth" from which you build your ES index. ES's text searching performance will beat PG's. Commented Oct 30, 2020 at 15:22

2 Answers 2

3

If you perform a substring search, use a trigram index:

CREATE EXTENSION pg_trgm;

CREATE INDEX ON atable USING gin ((col1 || ' ' || col2 ... gin_trgm_ops);

Then query like

WHERE col1 || ' ' || col2 || ... ILIKE '%abcd%'

For decent performance, don't start searching until the string has reached a length of 4 or so.

Sign up to request clarification or add additional context in comments.

Comments

2

Your GiST index is not treating col4 the same way as the query treats it. Also, GiST indexes are sensitive to column order, and the column which can eliminate the most rows with the least work should go first. Presumably this would be account_id:

CREATE INDEX ix_name ON table
   USING gist (account_id, col1 gist_trgm_ops, col2 gist_trgm_ops, LOWER(col3) gist_trgm_ops, lower(col4) gist_trgm_ops);

The same index only in GIN rather than GiST form might be effective, you can try both and see. That one would not be sensitive to the order the columns are specified in the index.

The pg_trgm indexes operators support LIKE and ILIKE equally well. Rather than doing lower on two of the columns, you could just query with ILIKE rather than LIKE on the original columns and build the index on the original columns. This probably won't change the performance much, but the simpler indexes will be more flexible.

But is it really necessary to having the leading '%' in your queries? Do people expect autocomplete to work when they start typing from the middle?

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.