My Python endless loop detect actions, which run threads - Python next functions. Every of this threads is used SQLalchemy object to operate on (the same) MySQL DB. This threads are starting in period of many hours and even days or for several at once- depend on users activity. What is /should be optimal way to use DB? All of them could use the same global db object or each of them should create the new-one?
1 Answer
Do note that SQLAlchemy session/connection is not thread safe.
The Session object is entirely designed to be used in a non-concurrent fashion, which in terms of multithreading means “only in one thread at a time”.
In a nutshell, every session has a backing store to track all objects (model instances) that is added/removed/modified within the session. Sharing session and the tracked objects across threads will not play nice with this backing store as you cannot guarantee thread safety.
You can use scoped_session that provides scoped management of session objects (and the underlying connection pool). This way your sessions are bind to a Thread Local Scope, and you can use a single session within your threaded operation without worrying about concurrency.
We call this notion thread local storage, which means, a special object is used that will maintain a distinct object per each application thread. Python provides this via the threading.local() construct.
A scoped_session uses theading.local() as storage, and a single session is maintained when called upon within the scope of a single thread. Callers from a different thread to scoped_session will get a different session object.
from sqlalchemy.orm import scoped_session
from sqlalchemy.orm import sessionmaker
# Note that session_factory, some_engine,
# and scoped_session are global objects
session_factory = sessionmaker(bind=some_engine)
Session = scoped_session(session_factory)
# Calls to `Session` will return a `session` object
# that is backed by a thread-local store
some_session = Session()
# Somewhere down the line you call `Session` again
# within the same thread will yield the same session object
some_other_session = Session()
some_session is some_other_session # True
Above, some_session is an instance of Session, which we can now use to talk to the database. This same Session is also present within the scoped_session registry we’ve created. If we call upon the registry a second time, we get back the same Session:
# All objects managed by `some_session` is stored in thread-local
some_session.add(..)
some_session.remove(..)
some_session.query(..)
some_session.commit() # or .rollback()
# Calling `Session.remove()` closes `some_session`
# and returns the `connection` back to the pool for reuse
Session.remove()
It is important to call Session.remove() (Session here being a scoped_session) instead of just some_session.close(). This has the effect of cleaning up work.
The scoped_session.remove() method first calls Session.close() on the current Session, which has the effect of releasing any connection/transactional resources owned by the Session first, then discarding the Session itself. “Releasing” here means that connections are returned to their connection pool and any transactional state is rolled back, ultimately using the rollback() method of the underlying DBAPI connection.
4 Comments
DB. In a threaded application, the engine and connection pool are thread safe. However, you should leverage on scoped_session to manage the creation of new session objects in each thread in a thread-safe way, with the added bonus of managing your connection pooling.