2

I'm coding a python script that runs a certain SELECT query async. After the first time running the script, it always errors out after that with the following error:

google.cloud.exceptions.Conflict: 409 Already Exists: Job ps-bigdata:vci-temp-sales-query-job (POST https://www.googleapis.com/bigquery/v2/projects/ps-bigdata/jobs)

Here is a code snippet:

from google.cloud import bigquery

google_auth_json_file = './myprojectauth.json'
client = bigquery.Client.from_service_account_json( google_auth_json_file )

project = 'myProject'
dataset = 'myDataset'
ds = client.dataset(dataset)
query = "SELECT X,y,z FROM mytable;"

#--- Clear/create temp table
temp_table_name = 'myTempTable'
temp_tbl = myCreateTempTableFunction( client, project, dataset, temp_table_name )

#--- Create an async query job
job_name = 'vci-temp-sales-query-job'
job = client.run_async_query(job_name, query)
job.destination = temp_tbl
job.write_disposition = 'WRITE_TRUNCATE'
job.begin()

This script fails at the "job.begin()" line. I didn't know that named jobs live on beyond the end of the session or the execution of the job. How do I check if a named job already exists, and if it exists, how do I delete the existing named job to create a new one? Or do I have to create random or unique job names ever time I run an async job?

1
  • You can check if a job exists with job.exists(). If it exists, then you can cancel it with job.cancel(). You may want to check job.ended before you cancel it. Commented Jul 3, 2017 at 22:31

1 Answer 1

2

You need to use a unique job ID, since this is what the metadata for the operation is associated with. Referring to the querying data example, your code could be something like this:

job_name = 'vci-temp-sales-query-job_{}'.format(uuid.uuid4())
Sign up to request clarification or add additional context in comments.

3 Comments

I just found that answer in an example code snippet somewhere too. Thank you!. The job ID passed to the client.run_async_query() method must be unique. So, adding "import uuid" and "uuid.uuid4()" to get a unique ID is the best option.
Is there a specific reason why BigQuery is designed to take a unique Job ID everytime?
You can interact with or retrieve information about the job using this ID. If there's an active job with the same ID, as in the OP's question, then there would be no way e.g. to get the results of or cancel the job.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.