I'm using this code to update several records on Redshift (around 30.000 records per run).
cur = conn.cursor(cursor_factory=RealDictCursor)
sql_string_update = """UPDATE my_table SET "outlier_reason" = {0} WHERE "id" = {1};"""
for id, row in df_ignored.iterrows():
sql_ = sql_string_update.format(row['outlier_reason'],id)
cur.execute(sql_)
conn.commit()
Every run of around 30.000 elements takes up to 2 hours of execution.
Is there a way to speed up this query?
insertthe new rows 2.deletethe rows need to be updated. 3. usecopyto load data from S3 to Redshift. Let me know if you need more clarification