I have a small daily computing job that imports data from BigQuery, uses Python numerical computing libraries (pandas, numpy) to process and then write results to an external table (Firestore or MySQL at another project)
What is the recommended way to deploy it on GCP?
Our devops advice us against creating a single vm just for doing batch job. They would prefer not to manage the VM infrastructure themselves, and there should be services born to support batch job. They insist that I use Dataflow. But I think Dataflow distributed nature is a little bit overkill.
Many thanks,
Updated October 14, 2019:
I'm thinking about dockerizing the batch job and deploy to a K8 cluster. The downside is that the cluster should host several jobs to worth the setup and maintaining effort. Can someone give me advice on the feasibility and suitability of this approach?
Updated October 15, 2019:
Thanks Alex Titov for his comment at https://googlecloud-community.slack.com/archives/C0G6VB4UE/p1571032864020000. Based on his suggestion, I'm going to break my job into multiple small Cloud Functions components and chain them together as a pipeline by Cloud Scheduler and/or Cloud Composer.