10

Using Postgres 9.5, I have a table addresses.

CREATE TABLE addresses (
    id        integer PRIMARY KEY,
    address   text
);

In that table I have 7.5 million rows. Example:

1, "1600 Pennsylvania Avenue NW, Washington, DC, 20500"

I'm using this table for an autosuggest search in my application so I need to use this type of query:

SELECT * FROM addresses WHERE address LIKE '123 Main St%';

I created this index:

CREATE INDEX address_idx ON addresses (address);

But the problem is it's taking around 1 sec which is too slow.

Here's the query plan:

EXPLAIN SELECT * FROM addresses WHERE address LIKE '123 Main St%';
----
Seq Scan on addresses  (cost=0.00..161309.76 rows=740 width=41)
  Filter: (address ~~ '123 Main St%'::text)

I tried creating a few types of gin indexes but they either had no effect or made the query slower. I'm not sure if I was using them correctly though.

Any ideas on how to create an index that's optimized for this kind of query?


EDIT

The best solution found so far is to use a text range scan:

SELECT *
FROM addresses
WHERE address >= '123 Main St' AND
      address <= concat('123 Main St', 'z');
4
  • Will this always be a prefix search? Then you can try WHERE address BETWEEN '123 Main St' AND '123 Main Su'. That should produce a range scan on the index. Commented Mar 9, 2016 at 0:57
  • 1
    The index should work fine for this query. Perhaps there is an incompatibility with the text types somehow that prevents the use of the index. Commented Mar 9, 2016 at 1:39
  • @Thilo Thanks! Using BETWEEN produces the same results as the LIKE query and reduced the time down to 13ms. To answer your question, yes this will always be a prefix search. The only thing I don't like about this method is I'll have to come up with the next letter in the alphabet or the next number instead of using a wildcard symbol. Is there any other way to produce a range scan without having to write in that kind of logic? Commented Mar 9, 2016 at 1:51
  • Ideally, the query planner should have detected the prefix search and transformed it into a BETWEEN query. And even if not, worst case should have been an index scan (instead of a table scan) -- not that that would have been any faster in this case. Commented Mar 9, 2016 at 2:44

2 Answers 2

5

This is an elaboration on the between method and too long for a comment.

If you are using standard ASCII characters, you can use the tilde-trick:

SELECT *
FROM addresses
WHERE address >= '123 Main St' AND
      address <= concat('123 Main St', '~');

Tilde has a larger ASCII value than other characters.

I do note that Postgres should use the index for the LIKE query as well. My guess is that the problem is something to do with compatibility of the types. Perhaps if you converted the pattern to a varchar(), Postgres would use the index.

Sign up to request clarification or add additional context in comments.

8 Comments

Thanks for the reply. Pretty clever indeed but I'm running into trouble with that query: ERROR: argument of AND must be type boolean, not type text - on line 4. I'm using standard ASCII characters btw. Could you elaborate on what you mean by convert the pattern to a varchar()? Do you mean the column?
@Tyler . . . That might be due to operator precedence.
Removing the double pipe and adding the tilde, i.e. '123 Main St~', solves the error but returns no results. I also tried '123 Main S~' which didn't work. Referencing the ASCII table, it seems like 'z' is the largest ASCII value character that I can use to return any results. Edit: concat('123 Main St', 'z') works.
@Tyler . . . Interesting. The tilde has worked for me in other contexts. At least the z works, though.
Thanks for your help. I think the range scan with your concat suggestion is the best solution!
|
4

Three things you could try:

  1. If your database is on 'C' locale (which you can check with \l on the psql prompt) then the regular Btree indexes should help in optimizing a LIKE 'abc%' type of query.
  2. If not, you could try using a suitable operator class when creating the Btree index. For e.g. CREATE INDEX tbl_col_text_pattern_ops_idx ON tbl(col text_pattern_ops);
  3. If that doesn't work, you could also try using the GiST / GIN, more detailing for which is given here.

If you'd like to know more you should read Erwin's StackOverflow answer here, that details how different Postgres indexes work with LIKE / ILIKE.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.