2

I'd like to cluster a PostgreSQL table based on information in other tables.

Imagine two tables - foos (id, baz_id, name) and bars (foo_id, name). bars.foo_id is a foreign key reference to foos.id.

I'd like to cluster the bars table so that it is ordered by foos.baz_id.

Is this possible?

2
  • Your sample schema doesn't actually show a relationship between the two tables (I'm guessing there should be a foo_id on foos). More importantly, could you explain why you want to cluster the data in this way / what you hope it will achieve? Commented Jun 6, 2014 at 19:47
  • @IMSoP My data is such that it's highly likely that bars related to foos that have the same baz_id will be returned in the same query. None of the other columns on bars can simulate that ordering (although foo_id isn't horrible). I expect this will decrease IO read time significantly on a several hundred million record table. Commented Jun 6, 2014 at 21:38

1 Answer 1

2

This can be accomplished using denormalization.

Add a bars.baz_id column and add triggers or application layer logic that maintains the appropriate values in that table.

Then add an index that uses the denormalized bars.baz_id column and cluster by that index.

CREATE INDEX index_bars_on_baz_id ON bars (baz_id);
CLUSTER bars USING index_bars_on_baz_id;
Sign up to request clarification or add additional context in comments.

2 Comments

Note that CLUSTER in Postgres doesn't create a permanently clustered index like in other DBMSes, it just arranges the data that exists at that time based on the index. New data will still be added wherever it fits, until you run CLUSTER again to rearrange it.
It's also important to re-ANALYZE after each re-CLUSTER.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.