We need to design a process that efficiently imports large CSV files that are created by upstream processes on a regular base into AlloyDB. We'd like to use python for this task. What is the best practice in this case?
Some considerations:
- SQL's INSERT statement is way less performant than using a database specific import tool like pg_restore
- While pg_restore can be executed remotely, I'd expect import performance of huge files to be significantly better when run locally on the DB server because of the saved network round trips
- AlloyDB documentation says: SSH into the DB server from a container, copy over the file from GCS bucket to local and run psql COPY / pg_restore. This is not a very convenient set of actions to do programatically.
We have a similar setup with a CloudSQL postgres instance. In contrast to AlloyDB, CloudSQL offers a nice API that acts as an abstraction layer and handles the whole import of the file. By that, it takes away a lot of burden from the developer.