2

By "Google Batch" I'm referring to the new service Google launched about a month or so ago.

https://cloud.google.com/batch

I have a Python script which takes a few minutes to execute at the moment. However with the data it will soon be processing in the next few months this execution time will go from minutes to hours. This is why I am not using Cloud Function or Cloud Run to run this script, both of these have a max 60 minute execution time.

Google Batch came about recently and I wanted to explore this as a possible method to achieve what I'm looking for without just using Compute Engine.

However documentation is sparse across the internet and I can't find a method to "trigger" an already created Batch job by using Cloud Scheduler. I've already successfully manually created a batch job which runs my docker image. Now I need something to trigger this batch job 1x a day, thats it. It would be wonderful if Cloud Scheduler could serve this purpose.

I've seen 1 article describing using GCP Workflow to create a a new Batch job on a cron determined by Cloud Scheduler. Issue with this is its creating a new batch job every time, not simply re-running the already existing one. To be honest I can't even re-run an already executed batch job on the GCP website itself so I don't know if its even possible.

https://www.intertec.io/resource/python-script-on-gcp-batch

Lastly, I've even explored the official Google Batch Python library and could not find anywhere in there some built in function which allows me to "call" a previously created batch job and just re-run it.

https://github.com/googleapis/python-batch

2
  • 1
    I'm unfamiliar with Batch but familiar with Google Cloud generally. From a quick read, it appears that Batch Jobs, when created, are run. There's no (obvious?) mechanism to rerun a completed Job. An authenticated (!) POST'ing to the endpoint documented here is what you'll want to specify as the HTTP target (!) for Cloud Scheduler. Commented Oct 5, 2022 at 23:20
  • 1
    Because Cloud Scheduler has a limited set of types that it supports, you can't provide it with Batch "types" directly, i.e. if you create the Batch Job using Google's Python SDK, you'll need to convert it into JSON to pass it to Google's Cloud Scheduler Python SDK because, an HTTP target expects e.g. a URI and a (JSON) message body. Commented Oct 5, 2022 at 23:22

2 Answers 2

3

I wrote this for you this morning as a guide.

It uses Google's example in combination with Cloud Scheduler:

# Used to correctly (!?) form Batch Job
import google.cloud.batch_v1.types

import google.cloud.scheduler_v1
import google.cloud.scheduler_v1.types

import os


project = os.getenv("PROJECT")
number = os.getenv("NUMBER")
location = os.getenv("LOCATION")
job = os.getenv("JOB")

# Batch Job
# Create Batch Job using batch_v1.types
# Alternatively, create this from scratch
batch_job = google.cloud.batch_v1.types.Job(
    priority=0,
    task_groups=[
        google.cloud.batch_v1.types.TaskGroup(
            task_spec=google.cloud.batch_v1.types.TaskSpec(
                runnables=[
                    google.cloud.batch_v1.types.Runnable(
                        container=google.cloud.batch_v1.types.Runnable.Container(
                            image_uri="gcr.io/google-containers/busybox",
                            entrypoint="/bin/sh",
                            commands=[
                                "-c",
                                "echo \"Hello world! This is task ${BATCH_TASK_INDEX}. This job has a total of ${BATCH_TASK_COUNT} tasks.\""
                            ],
                        ),
                    ),
                ],
                compute_resource=google.cloud.batch_v1.types.ComputeResource(
                    cpu_milli=2000,
                    memory_mib=16,
                )
            ),
            task_count=1,
            parallelism=1,
        ),
    ],
    allocation_policy=google.cloud.batch_v1.types.AllocationPolicy(
        location=google.cloud.batch_v1.types.AllocationPolicy.LocationPolicy(
           allowed_locations=[
            f"regions/{location}",
           ], 
        ),
        instances=[
            google.cloud.batch_v1.types.AllocationPolicy.InstancePolicyOrTemplate(
                policy=google.cloud.batch_v1.types.AllocationPolicy.InstancePolicy(
                    machine_type="e2-standard-2",
                ),
            ),
        ],
    ),
    labels={
        "stackoverflow":"73966292",
    },
    logs_policy=google.cloud.batch_v1.types.LogsPolicy(
        destination=google.cloud.batch_v1.types.LogsPolicy.Destination.CLOUD_LOGGING,
    ),
)

# Convert the Google Batch Job into JSON
# Google uses Proto Python
# https://proto-plus-python.readthedocs.io/en/stable/messages.html?highlight=JSON#serialization
batch_json=google.cloud.batch_v1.types.Job.to_json(batch_job)
print(batch_json)

# Convert JSON to bytes as required for body by Cloud Scheduler
body=batch_json.encode("utf-8")

# Run hourly on the hour (HH:00)
schedule = "0 * * * *"

parent = f"projects/{project}/locations/{location}"
name = f"{parent}/jobs/{job}"
uri = f"https://batch.googleapis.com/v1/{parent}/jobs?job_id={job}"

service_account_email = f"{number}[email protected]"

scheduler_job = google.cloud.scheduler_v1.types.Job(
    name=name,
    description="description",
    http_target=google.cloud.scheduler_v1.types.HttpTarget(
        uri=uri,
        http_method=google.cloud.scheduler_v1.types.HttpMethod(
            google.cloud.scheduler_v1.types.HttpMethod.POST,
        ),
        oauth_token=google.cloud.scheduler_v1.types.OAuthToken(
            service_account_email=service_account_email,
        ),
        body=body,
    ),
    schedule=schedule,
)

scheduler_json=google.cloud.scheduler_v1.Job.to_json(scheduler_job)
print(scheduler_job)

request = google.cloud.scheduler_v1.CreateJobRequest(
    parent=parent,
    job=scheduler_job,
)

scheduler_client = google.cloud.scheduler_v1.CloudSchedulerClient()
print(
    scheduler_client.create_job(
        request=request
    )
)

You can test using:

BILLING="..."
PROJECT="..."
LOCATION="..." # E.g. us-west1

JOB="tester"

ACCOUNT="tester"
EMAIL="${ACCOUNT}@${PROJECT}.iam.gserviceaccount.com"

# Create Project and enable Billing
gcloud projects create ${PROJECT}
gcloud beta billing projects link ${PROJECT} \
--billing-account=${BILLING}

# Enable Cloud Scheduler and Cloud Run
SERVICES=(
  "batch"
  "cloudscheduler"
  "compute"
)
for SERVICE in ${SERVICES[@]}
do
  gcloud services enable ${SERVICE}.googleapis.com \
  --project=${PROJECT}
done

# Create Service Account
gcloud iam service-accounts create ${ACCOUNT} \
--project=${PROJECT}

gcloud iam service-accounts keys create ${PWD}/${ACCOUNT}.json \
--iam-account=${EMAIL} \
--project=${PROJECT}

# IAM
# https://cloud.google.com/iam/docs/understanding-roles#cloud-scheduler-roles
ROLES=(
  "roles/batch.jobsEditor"
  "roles/cloudscheduler.admin"
)
for ROLE in ${ROLES[@]}
do
  gcloud projects add-iam-policy-binding ${PROJECT} \
  --member=serviceAccount:${EMAIL} \
  --role=${ROLE}
done

# ActAs
NUMBER=$(\
  gcloud projects describe ${PROJECT} \
  --format="value(projectNumber)")
COMPUTE_ENGINE="${NUMBER}[email protected]"
gcloud iam service-accounts add-iam-policy-binding ${COMPUTE_ENGINE} \
--member=serviceAccount:${EMAIL} \
--role="roles/iam.serviceAccountUser" \
--project=${PROJECT}

Then:

python3 -m venv venv
source venv/bin/activate

# Or requirements.txt
python3 -m pip install google-cloud-batch
python3 -m pip install google-cloud-scheduler

export JOB
export LOCATION
export NUMBER
export PROJECT

export GOOGLE_APPLICATION_CREDENTIALS=${PWD}/${ACCOUNT}.json

python3 main.py
Sign up to request clarification or add additional context in comments.

2 Comments

This is incredible, thank you! Will do a test. Speaking of test, the bash script you included where would you execute that code? Locally or in the Google CLI?
You're welcome. Your question piqued my interest. I'd run the bash script on my local development machine but you could run it on an Linux VM with gcloud installed including Cloud Shell. Just be mindful that it creates a Service Account key so you should treat that with care.
1

There is a misunderstanding. When you use Cloud Run jobs, you create a configuration and you execute a configuration.

BUT, with Batch job, you execute a configuration. That's all, no configuration to create in advance.

Have a look to the APIs: Create, Get, Delete. No more.

Therefore, you have to set in your Cloud Scheduler, the whole Batch configuration to create a new job. Take care to NOT set the jobID in the query parameter.

1 Comment

So if I were to use Cloud Scheduler to create a batch job, I would send an HTTP Post to the appropriate batch API endpoint and within the body of the Cloud Scheduler job I would pass in an entire json as shown here: cloud.google.com/batch/docs/reference/rest/v1alpha/… Does that sound right?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.