1

In the case of sqlite, it is not clear whether we can easily commit right after each dataframe insert. (Assuming that auto-commit is off by default, following the python database wrapping convention).

Using the simplest sqlalchemy api flow ―

db_engine = db.create_engine()
for .....
   # slowly compute some_df, takes a lot of time
   some_df.to_sql(con = db_engine)

How can we make sure that every .to_sql is committed?

For motivation, imagine the particular use case being that each write reflects the result of a potentially very long computation and we do not want to lose a huge batch of such computations nor any single one of them, in case a machine goes down or in case the python sqlalchemy engine object is garbage collected before all its writes have actually drained in the database.

I believe auto-commit is off by default, and for sqlite, there is no way of changing that in the create_engine command. What might be the simplest, safest way for adding auto-commit behavior ― or explicitly committing after every dataframe write ― when using the simplistic .to_sql api?

Or must the code be refactored to use a different api flow to accomplish that?

5
  • 1
    db_engine = db_engine.execution_options(autocommit=True)? Commented Sep 29, 2019 at 16:59
  • This should work. I failed to find it myself while being lost in the somewhat complicated API docs of the involved three libraries. You can post that as an answer I guess. Commented Sep 29, 2019 at 17:07
  • SQLA engine is in autocommit mode by default, at least in versions up to 1.3. "in case the python sqlalchemy engine object is garbage collected before all its writes have actually drained in the database" seems like something that's not going to happen. Commented Sep 29, 2019 at 17:16
  • @IljaEveril are you absolutely sure? Commented Sep 29, 2019 at 17:19
  • Yes. The link you pasted refers to (ORM) sessions, which can use an engine as their bind, but are not engines. Commented Sep 29, 2019 at 17:21

2 Answers 2

2

You can set the connection to autocommit by:

db_engine = db_engine.execution_options(autocommit=True)
Sign up to request clarification or add additional context in comments.

Comments

1

From https://docs.sqlalchemy.org/en/13/core/connections.html#understanding-autocommit:

The “autocommit” feature is only in effect when no Transaction has otherwise been declared. This means the feature is not generally used with the ORM, as the Session object by default always maintains an ongoing Transaction.

In your code you have not presented any explicit transactions, and so the engine used as the con is in autocommit mode (as implemented by SQLA).

Note that SQLAlchemy implements its own autocommit that is independent from the DB-API driver's possible autocommit / non-transactional features.

Hence the "the simplest, safest way for adding auto-commit behavior ― or explicitly committing after every dataframe write" is what you already had, unless to_sql() emits some funky statements that SQLA does not recognize as data changing operations, which it has not, at least of late.

It might be that the SQLA autocommit feature is on the way out in the next major release, but we'll have to wait and see.

3 Comments

Thanks a lot. I think it's fair to say that this behavior is barely deducible for mere mortals, from the referenced documentation section, so this is helpful. I'd improve that section for bottom-line greater clarity if I were the maintainers. Or make it obvious that auto-commit is the default after all when there's no explicit transaction in user code.
And thanks for the heads-up about the possible future change.
I keep the other solution to this question in my code, for future compatibility.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.