1

We are using Postgres for RBAC permissions model with groups and trying to think on the best indexes needed, assuming our DB has the following schema:

Subjects Table

id, external_type, external_id, group_id

Resources Table

id, group_id, external_type, external_id, role_id

Both tables are meant to answer a single question:

Can subject [S] perform action [A] on resource [R]

So we need to retrieve all the roles a user has on a resource, from all the permission groups he takes part in.

  • A subject can have K rows in the subjects table, one for each group he is a member at.
  • A resource can have M rows in the resources table, one for every group a role was assigned for this resource.
  • We chose to denormalize the resources, groups and roles in one table for read optimized performance.
  • We chose not to denormalize also subjects in the same table - to avoid updating many records on every group structure change.
  • Both K & M can be very big - a user can be in many groups, and a resource can belong to many groups.

So the query will be:

SELECT role_id
FROM resources
INNER JOIN subjects
ON resources.group_id=subjects.group_id
WHERE subjects.external_type="user" AND subjects.external_id=123
  AND resources.external_type="order" AND resources.external_id=456

We decided on defining the following indexes:

Subjects: <external_type, external_id>, <group_id>
Resources: <external_type, external_id>, <group_id>

Can someone please explain how does inner join with where clause relates to 2 tables utilizes indexes? Are they performed parallel and then connected by the ON statement or only one table's indexes will be used for the where and then connected using the ON connection?

Should we use different compound indexes? add the group_id to the compound indexes somehow?

Any reference to a similar use case or choosing indexes for complex JOIN queries will be helpful.

4
  • Your indexes are OK. The query will use a nested loop over a range index scan and then an index seek on the index of the other table. I don't think it matters which table is primary or secondary in this case. Now: 1) Is the query slow? 2) How many rows is the query returning? This could affect the heap reading, but should not be bad. Commented Jun 19, 2020 at 13:12
  • Is this the actual, bona fide query? If so, you could add role_id to the index for extra performance. This could marginally improve the performance. Commented Jun 19, 2020 at 13:13
  • How selective are the WHERE conditions on the respective queries, that is, how many rows will pass? How many rows will the actual query produce? I am asking for estimates. Commented Jun 19, 2020 at 13:20
  • @TheImpaler the system is still in development. 1) We are working on creating a test case and run EXPLAIN on it. 2) Although there could be many matches for each part of the where clause (matching <external_type, external_id> on each table), the intersection should return very few results - also answering your question @Laurenz Albe Commented Jun 19, 2020 at 16:18

1 Answer 1

1

This could be done in a variety of ways. It could read from both indexes and tables independently, then hash join or merge join them together. Or it could ignore one or both indexes, doing a seq scan on the tables instead if it thought that would be faster (because the indexes would return a large fraction of the rows). Or it could do a nested loop where it uses the constants from the WHERE clause plus the changing group_id from the ON clause to form a triple which it will look up in the three column index for one of the tables (the inner table). The outer table could also be driven by its index (using just the first two columns, which are constant for the duration of a query) or using a sequential scan.

If you want to know what plan is being used, do an EXPLAIN or better an EXPLAIN (ANALYZE, BUFFERS) of the query.

Sign up to request clarification or add additional context in comments.

4 Comments

Thanks for the thorough explanation. We are working on creating a test case with many records and we will run EXPLAIN then. Just to verify though - would you recommend on the indexes as presented: Subjects: <external_type, external_id>, <group_id>, Resources: <external_type, external_id>, <group_id>, OR: Subjects: <external_type, external_id, group_id>, Resources: <external_type, external_id, group_id>?
@Eliranf, your alternatives look the same to me--I don't spot a difference. I would add role_id at the end of the resources table's index, in the hopes to get an index-only scan.
Thanks a lot @jjanes, the difference is whether you meant one compound index per table with all 3 columns, or 2 different indexes per table - one compound of <external_type, external_id> and one single column group_id only
I see. I overlooked that in both your comment and your question. I don't see a value in the single-column index on group_id based on the one query you show. Group_id should be added to the compound index for one table, or the other, or both. My answer was based on the mistaken idea that it was already part of the multi-column indexes, not in only a single-column index. But using EXPLAIN to see what PostgreSQL plans to do is still a good idea.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.