0

Abstract:
For purpose of semester project on my university I need to implement entity-level access module for SQLAlchemy. ORM queries should return only those objects that user has access to. For example:

# python-pseudocode
# SomeModel contains multiple rows in database, but some_user only have acces to some_obj1, some_obj2

session.query(SomeModel).all()
[]  # returns empty list because user is not set

ACL.set_user(some_user)
session.query(SomeModel).all()
['some_obj1', 'some_obj2']  # returns object that user have acces to

Problem:
I have already implemented that, by extending BaseQuery of SQLAlchemy and overriding iterator, pretty much something like explained here, but my lecturer pointed out that is not right approach due to its poor-efficiency. He submited that I am filtering objects after retrieving them from database, so if there are million of rows in DB, and user only have acces to several of them I am retrieving those million objects for nothing. He suggested that I should intecept SQL statement riqht before its execution.

I did some research and I found SQLAlchemy events, before_cursor_execute seems to be good place to intercept statement. I had idea that I could parse statement and inject WHERE <tablename>.id IN <ids_that_user_have_access_to>. Unfortunately, I faced three problems:

  1. I need to be sure that I inject WHERE clause in right place in statement. I have this intuition that it's easy when query is simple, but with more complex queries it might be tricky. Is there way to insert WHERE in place that will always work fine?
  2. When it comes to parsing, statement intercepted in before_cursor_execute event, has this weird formatting with question marks, something like: SELECT sm.id AS sm_id, sm.some_field AS sm_some_field FROM sm WHERE sm.id = ? , parameters associated with statement are in parameters tuple (for exemplary statement it would be (1, )) which is passed to function along with statement. I tested parsing statement with moz_sql_parser but of course, question marks are not valid in SQL statement, so it cannot be parsed. How to parse such statement?
  3. Even when I'll be able to parse statement and inject WHERE clause, how would I know in which position in newly created parameters tuple should I place appropriate parameter?
4
  • 1
    if <ids_that_user_have_access_to> is in the order of millions then you still have the same problem. Ideally you want to selection and filtering to be done in the database in a single query. This ought to be possible if the access control criteria are also stored in the database. Commented Nov 25, 2020 at 12:24
  • Also, intercepting, parsing, dissecting, and modifying raw SQL could prove to be very difficult and error-prone for queries of arbitrary size and complexity. You might be better off wrapping the original query in a SELECT * FROM (…) WHERE id IN <list> Commented Nov 25, 2020 at 12:37
  • Actually, wrapping might be way to go (I haven't thought about it in first place), but I don't know if it's right to use * in statement wrapper. If not, I still need to parse original statement to extract column names. Commented Nov 25, 2020 at 17:41
  • Turned out, using * shouldn't be a problem, but there is still need for parsing statement. First of all - I need to modify only SELECT statements, easiest way is checking whether string contains SELECT but I don't know if it's right approach. Second of all - I need to know tablename to build condition. WHERE id IN <list> won't work because id won't be recognised, rather <tablename>_id is required. Commented Nov 27, 2020 at 10:49

1 Answer 1

1

I am replying to myself but I found suitable solution.

Turned out before_execute, event is better place to intercept and modify query, because uncompiled SQL construct ClauseElement is available. This makes modyfing query much easier.

@event.listens_for(engine, 'before_execute', retval=True)
def intercept(conn, clauseelement, multiparams, params):
    from sqlalchemy.sql.selectable import Select

    # check if it's select statement
    if isinstance(clauseelement, Select):
        # 'froms' represents list of tables that statement is querying
        table = clauseelement.froms[0]

        # adding filter in clause
        clauseelement = clauseelement.where(table.c.id.in_(list_of_ids))

    return clauseelement, multiparams, params

For more, read about before_execute event and expression API.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.