0

My program looks like this:


import sqlalchemy
engine = sqlalchemy.create_engine(DATABASE_URI)

~some code~

for group in groups:
    ~some code~
    for x in y:
        query = ~some query~
        params = ~some params~
        rows = engine.connect().execute(sqlalchemy.text(query), params)

This code works for two loops then it gets stuck. After getting rows I make a df out of it which looks ok.

Checking within postgres using this query:

SELECT pid, now() - pg_stat_activity.query_start AS dusration, pg_stat_activity.query, pg_stat_activity.state
FROM pg_stat_activity
WHERE pg_stat_activity.state = 'active'
ORDER BY pg_stat_activity.query_start DESC;

I can see that there is a query stuck in active which I guess running in the back and this is what make the program stuck.

I found some posts about sessions but couldnt find a good example for how to use and I can't understand the docs very good and I am not sure is the right solution?

The query I am using is the same but with different parameters each run.

If you can share some tips, links and anything that could help me get through the problem I will be thankful.

1 Answer 1

2

First, only connect once, and use the with context manager to automatically open and close it:

import sqlalchemy
engine = sqlalchemy.create_engine(DATABASE_URI)

~some code~

with engine.connect() as conn:
    for group in groups:
        ~some code~
        for x in y:
            query = ~some query~
            params = ~some params~
            rows = conn.execute(sqlalchemy.text(query), params)

Secondly, it's safer to use sessions, since they have built-in transactions. Only when you're always only reading, then there's no difference.

Thirdly, if you're not using the SQLAlchemy ORM or Core functionality - like session.query(User.id, User.name) or select([User.id, User.name]), you're only pretending to use SQLAlchemy. I'm not meaning to be rude, but if you're writing SQL queries in text and using SQLAlchemy only to execute them, you should use psycopg2 instead. Beware for SQL injection if this is the case.

Thirdly, one query per item in a nested loop is a recipe for bad performance. If you can possibly avoid it, combine the queries into one. Without more context, we can't help you do that, but just to give you an example:

for car in brands:
  for model in car.models:
    <SELECT car.price
     FROM cars
     WHERE brand = car.brand
       AND model = car.model>

can be simplified to

<SELECT car.brand, car.model, car.price
 FROM cars WHERE car.brand IN ('Renault', 'Ferrari', 'Tesla', 'Fiat')>

Even if it returns a few too many rows or if you still need to do some post-processing, it's most likely worth the overhead.

Sign up to request clarification or add additional context in comments.

2 Comments

Thank you for the tips. I am not using ORM, I am automating an excel build for our customer and it requires querying the DB with a function using different parameters. So basically if its not ORM its better to use pscopg2 and query the DB? And about sessions, could you refer me to good example to understand how to use it? I am not sure what a session is.
Yes, it's better to use psycopg2 if you're hardcoding queries, since that is exactly what SQLAlchemy does anyway. Sessions are a SQLAlchemy-only thing

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.