Optimizing PostgreSQL with functional indexes?

Question

While working on some performance tuning, I came across this posting from Instagram's engineering team:

http://instagram-engineering.tumblr.com/post/40781627982/handling-growth-with-postgres-5-tips-from-instagram

On some of our tables, we need to index strings (for example, 64 character base64 tokens) that are quite long, and creating an index on those strings ends up duplicating a lot of data. For these, Postgres’ functional index feature can be very helpful:
CREATE INDEX CONCURRENTLY on tokens (substr(token), 0, 8)
While there will be multiple rows that match that prefix, having Postgres match those prefixes and then filter down is quick, and the resulting index was 1/10th the size it would have been had we indexed the entire string.

This looked like a good idea, so I tried it -- we have a lot of items that are keyed by an checksum.

Our results were not good. I'm wondering if anyone else has had luck.

First off, the blog post looks wrong:

CREATE INDEX CONCURRENTLY on tokens (substr(token), 0, 8)

Shouldn't that be...

CREATE INDEX CONCURRENTLY on tokens (substr(token, 0, 8));

One of our fields was based on a 40character hash. So I tried :

CREATE INDEX __speed_idx_test_8 on foo (substr(bar, 0, 8));

The query planner wouldn't use it.

So I tried :

CREATE INDEX __speed_idx_test_20 on foo (substr(bar, 0, 20));

The query planner still wouldn't use it.

then i tried:

CREATE INDEX __speed_idx_test_40 on foo (substr(bar, 0, 40));

Still, the planner wouldn't use it.

What if we try and disable seq scans ?

set enable_seqscan=false;

Nope.

Let's go back to our original index.

CREATE INDEX __speed_idx_original on foo (bar);
set enable_seqscan = True;

And that works.

Then I thought -- maybe I need to use a function in the query in order to use a function index. So I tried changing the query:

old:

select * from foo where hash = '%s';

new

select * from foo where substr(hash,0,8) = '%s' and hash = '%s';

And that worked.

Does anyone know if it is possible to make this work without adding in an extra search condition? I'd rather not do that, but looking at the filesize and speed improvements... wow.

and if you're wondering what the 'explain analyze' output was...

-- seq scan
Seq Scan on foo  (cost=10000000000.00..10000073130.77 rows=1 width=1921) (actual time=373.785..1563.551 rows=1 loops=1)
  Filter: (hash = 'eae1d1728963f107fa7d8136bcf7c72572896e1d'::bpchar)
  Rows Removed by Filter: 450252
Total runtime: 1563.687 ms


-- index scan
Index Scan using __speed_idx_original on foo  (cost=0.00..16.53 rows=1 width=1920) (actual time=0.060..0.061 rows=1 loops=1)
  Index Cond: (hash = 'eae1d1728963f107fa7d8136bcf7c72572896e1d'::bpchar)
Total runtime: 1.501 m


-- index scan with substring function
 Index Scan using __speed_idx_test_8 on foo  (cost=0.00..16.37 rows=1 width=1913) (actual time=0.134..0.134 rows=0 loops=1)
  Index Cond: (substr((hash)::text, 0, 8) = 'eae1d172'::text)
  Filter: (hash = 'eae1d1728963f107fa7d8136bcf7c72572896e1d'::bpchar)
Total runtime: 0.216 ms

Kirk Roybal · Accepted Answer · 2014-06-05 18:28:36Z

3

It only works when you use the function in the WHERE clause. The function signature acts as a hint to the query planner that the scalar value returned from the function is contained in the index. This only works with immutable functions. Volatile functions (functions that don't return the same result on every call, like rand()) cannot be indexed using this method.

answered Jun 5, 2014 at 18:28

Kirk Roybal

18k2 gold badges34 silver badges39 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Jonathan Vanasco Over a year ago

Thanks. Thats what I thought / feared.

Craig Ringer Over a year ago

Exactly. There is no way to make PostgreSQL automagically realise that this is a prefix index. (Well, there is, but it involves adding a feature to PostgreSQL - feel free to fund the work!). So you'll need to modify your queries to take advantage of the expression index.

Collectives™ on Stack Overflow

Optimizing PostgreSQL with functional indexes?

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related