Optimal Postgres text index for LIKE query?

Question

Using Postgres 9.5, I have a table addresses.

CREATE TABLE addresses (
    id        integer PRIMARY KEY,
    address   text
);

In that table I have 7.5 million rows. Example:

1, "1600 Pennsylvania Avenue NW, Washington, DC, 20500"

I'm using this table for an autosuggest search in my application so I need to use this type of query:

SELECT * FROM addresses WHERE address LIKE '123 Main St%';

I created this index:

CREATE INDEX address_idx ON addresses (address);

But the problem is it's taking around 1 sec which is too slow.

Here's the query plan:

EXPLAIN SELECT * FROM addresses WHERE address LIKE '123 Main St%';
----
Seq Scan on addresses  (cost=0.00..161309.76 rows=740 width=41)
  Filter: (address ~~ '123 Main St%'::text)

I tried creating a few types of gin indexes but they either had no effect or made the query slower. I'm not sure if I was using them correctly though.

Any ideas on how to create an index that's optimized for this kind of query?

EDIT

The best solution found so far is to use a text range scan:

SELECT *
FROM addresses
WHERE address >= '123 Main St' AND
      address <= concat('123 Main St', 'z');

Will this always be a prefix search? Then you can try WHERE address BETWEEN '123 Main St' AND '123 Main Su'. That should produce a range scan on the index. — Thilo
– Thilo, Commented Mar 9, 2016 at 0:57
The index should work fine for this query. Perhaps there is an incompatibility with the text types somehow that prevents the use of the index. — Gordon Linoff
– Gordon Linoff, Commented Mar 9, 2016 at 1:39
@Thilo Thanks! Using BETWEEN produces the same results as the LIKE query and reduced the time down to 13ms. To answer your question, yes this will always be a prefix search. The only thing I don't like about this method is I'll have to come up with the next letter in the alphabet or the next number instead of using a wildcard symbol. Is there any other way to produce a range scan without having to write in that kind of logic? — Tyler
– Tyler, Commented Mar 9, 2016 at 1:51
Ideally, the query planner should have detected the prefix search and transformed it into a BETWEEN query. And even if not, worst case should have been an index scan (instead of a table scan) -- not that that would have been any faster in this case. — Thilo
– Thilo, Commented Mar 9, 2016 at 2:44

Gordon Linoff · Accepted Answer · 2016-03-09 02:29:28Z

5

This is an elaboration on the between method and too long for a comment.

If you are using standard ASCII characters, you can use the tilde-trick:

SELECT *
FROM addresses
WHERE address >= '123 Main St' AND
      address <= concat('123 Main St', '~');

Tilde has a larger ASCII value than other characters.

I do note that Postgres should use the index for the LIKE query as well. My guess is that the problem is something to do with compatibility of the types. Perhaps if you converted the pattern to a varchar(), Postgres would use the index.

edited Mar 9, 2016 at 2:29

answered Mar 9, 2016 at 1:57

Gordon Linoff

1.3m62 gold badges705 silver badges857 bronze badges

Sign up to request clarification or add additional context in comments.

8 Comments

Tyler Over a year ago

Thanks for the reply. Pretty clever indeed but I'm running into trouble with that query: ERROR: argument of AND must be type boolean, not type text - on line 4. I'm using standard ASCII characters btw. Could you elaborate on what you mean by convert the pattern to a varchar()? Do you mean the column?

Gordon Linoff Over a year ago

@Tyler . . . That might be due to operator precedence.

Tyler Over a year ago

Removing the double pipe and adding the tilde, i.e. '123 Main St~', solves the error but returns no results. I also tried '123 Main S~' which didn't work. Referencing the ASCII table, it seems like 'z' is the largest ASCII value character that I can use to return any results. Edit: concat('123 Main St', 'z') works.

Gordon Linoff Over a year ago

@Tyler . . . Interesting. The tilde has worked for me in other contexts. At least the z works, though.

Tyler Over a year ago

Thanks for your help. I think the range scan with your concat suggestion is the best solution!

|

Community · Accepted Answer · 2017-05-23 12:31:11Z

4

Three things you could try:

If your database is on 'C' locale (which you can check with \l on the psql prompt) then the regular Btree indexes should help in optimizing a LIKE 'abc%' type of query.
If not, you could try using a suitable operator class when creating the Btree index. For e.g. CREATE INDEX tbl_col_text_pattern_ops_idx ON tbl(col text_pattern_ops);
If that doesn't work, you could also try using the GiST / GIN, more detailing for which is given here.

If you'd like to know more you should read Erwin's StackOverflow answer here, that details how different Postgres indexes work with LIKE / ILIKE.

edited May 23, 2017 at 12:31

CommunityBot

11 silver badge

answered Mar 9, 2016 at 5:36

Robins Tharakan

2,52122 silver badges19 bronze badges

Collectives™ on Stack Overflow

Optimal Postgres text index for LIKE query?

2 Answers 2

8 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

8 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related