1

I'm using this code to update several records on Redshift (around 30.000 records per run).

cur = conn.cursor(cursor_factory=RealDictCursor)
sql_string_update = """UPDATE my_table SET "outlier_reason" = {0} WHERE "id" = {1};"""
for id, row in df_ignored.iterrows():
    sql_ = sql_string_update.format(row['outlier_reason'],id)
    cur.execute(sql_)
conn.commit()

Every run of around 30.000 elements takes up to 2 hours of execution.

Is there a way to speed up this query?

1
  • 1
    You are running 30.000 updates on the database, there is no way this can get any faster. My recommendation is create a logic to 1. create a file in S3 for insert the new rows 2. delete the rows need to be updated. 3. use copy to load data from S3 to Redshift. Let me know if you need more clarification Commented Nov 28, 2018 at 18:58

1 Answer 1

1

In think instead of touching the table and doing updates one by one, you should be using ETL way of doing things, I believe that would be much faster. Should take care of 30K records in few minutes. Here is approach.

  1. Create a staging table, say stg_my_table (id,outlier_reason).
  2. Write your Python programs data into a CSV file or JSON file, whatever suits your case. Save it to S3 or EC2.
  3. Use copy command to load into stg_my_table along with ID.
  4. Do an Update to my_table by joining it with stg_my_table using the ID and set outlier_reason.

I think above solution must reduce time of processing from 2 Hrs to few minutes. Please try this way may be manually before writing the actual code. I'm sure you will see very promising results and then optimize each of above steps one by one to even gain more performance.

Sign up to request clarification or add additional context in comments.

2 Comments

Thanks for the answer. That sounds like a lot of work. I'll give it a try, pity there is no simpler solution.
@otmezger Redshift is not designed for very frequent updates and surely its not for individual single records updates as its columnar database.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.