2

I am trying to populate postgres table from another table, nearly about 24 millions records. but query become too slow it taking 9-10 hours. the update operation only update 1-2 row each second. i cant understand why it slow.

Current benchmark

enter image description here

  • Query = INSERT INTO .... SELECT FROM .... ON CONFLICT DO UPDATE
  • Source table has 24 Million records
  • Destination Already have 560 Millions records with indexes, unique keys, primary and foreign keys

Query(Sample)

INSERT INTO destination_tbl(col1, col2 .... , col22, false AS processed, null AS updated_at)
SELECT (col1, col2 .... , col22) FROM source_tbl
WHERE processed=false
ON CONFLICT (unique_cols...)
DO UPDATE
SET col1 = EXCLUDED.col1
        ....
        col22 = EXCLUDED.col22
        processed = false
        updated_at = now()
6
  • 1
    24 million in 10 hours is a lot more than 2 each second. Commented Jan 27, 2023 at 4:33
  • 3
    the query has updated 872 records per second. Not so bad Commented Jan 27, 2023 at 7:57
  • I wouldn't be surprised if I/O is throttled by Google Cloud SQL somehow Commented Jan 27, 2023 at 14:36
  • @Harish Nandoliya, please let me know if below information was helpful. Commented Feb 1, 2023 at 5:59
  • @VaidehiJamankar, Finally i figured out it was due to huge dead tuples, I had nearly 2 billions dead tuples. Commented Feb 1, 2023 at 9:34

1 Answer 1

1

The query performance results that you have mentioned do seem to be according to the query that you have.It is a simple insert query, which uses INSERT ... ON CONFLICT which is one of the ways to UPSERT data.However talking about the performance then it matters a lot if you use ON CONFLICT DO NOTHING or if you use an UPDATE clause.
Generally when a DO NOTHING clause is running , there won't be any dead tuples that have to be cleaned up whereas if you use an UPDATE clause, there will be a dead tuple, and cleaning up these dead tuples may take time which is definitely inclusive in the total query execution time.We know that INSERT ON CONFLICT always performs a read to determine the necessary writes, the UPSERT statement writes without reading, making it faster. For tables with secondary indexes, there is no performance difference between UPSERT and INSERT ON CONFLICT.
Try to check on the above factors and see if batch loads are possible or a query division which would allow a reduction in time of execution and also fillfactor value set should help in time reduction.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.