Abstract:
For purpose of semester project on my university I need to implement entity-level access module for SQLAlchemy. ORM queries should return only those objects that user has access to. For example:
# python-pseudocode
# SomeModel contains multiple rows in database, but some_user only have acces to some_obj1, some_obj2
session.query(SomeModel).all()
[] # returns empty list because user is not set
ACL.set_user(some_user)
session.query(SomeModel).all()
['some_obj1', 'some_obj2'] # returns object that user have acces to
Problem:
I have already implemented that, by extending BaseQuery of SQLAlchemy and overriding iterator, pretty much something like explained here, but my lecturer pointed out that is not right approach due to its poor-efficiency. He submited that I am filtering objects after retrieving them from database, so if there are million of rows in DB, and user only have acces to several of them I am retrieving those million objects for nothing. He suggested that I should intecept SQL statement riqht before its execution.
I did some research and I found SQLAlchemy events, before_cursor_execute seems to be good place to intercept statement. I had idea that I could parse statement and inject WHERE <tablename>.id IN <ids_that_user_have_access_to>. Unfortunately, I faced three problems:
- I need to be sure that I inject
WHEREclause in right place in statement. I have this intuition that it's easy when query is simple, but with more complex queries it might be tricky. Is there way to insertWHEREin place that will always work fine? - When it comes to parsing,
statementintercepted inbefore_cursor_executeevent, has this weird formatting with question marks, something like:SELECT sm.id AS sm_id, sm.some_field AS sm_some_field FROM sm WHERE sm.id = ?, parameters associated with statement are inparameterstuple (for exemplary statement it would be(1, )) which is passed to function along withstatement. I tested parsing statement withmoz_sql_parserbut of course, question marks are not valid in SQL statement, so it cannot be parsed. How to parse such statement? - Even when I'll be able to parse statement and inject
WHEREclause, how would I know in which position in newly createdparameterstuple should I place appropriate parameter?
<ids_that_user_have_access_to>is in the order of millions then you still have the same problem. Ideally you want to selection and filtering to be done in the database in a single query. This ought to be possible if the access control criteria are also stored in the database.SELECT * FROM (…) WHERE id IN <list>*in statement wrapper. If not, I still need to parse original statement to extract column names.*shouldn't be a problem, but there is still need for parsing statement. First of all - I need to modify onlySELECTstatements, easiest way is checking whether string containsSELECTbut I don't know if it's right approach. Second of all - I need to know tablename to build condition.WHERE id IN <list>won't work becauseidwon't be recognised, rather<tablename>_idis required.