3

I'm trying to create a clustered table in BigQuery.

When I test it in the UI, it works perfectly:

CREATE OR REPLACE TABLE `project_id_xyz.temp.clustering`
PARTITION BY date
CLUSTER BY cluster_col AS
SELECT CURRENT_DATE() as date, 1 as cluster_col

However when I try the same via google-bigquery==1.9.0 in python (3.7.1), the table is created and partitioned but not clustered. As seen in the "details" tab in the UI.

Here is the snippet I use to create the table.

dataset = client.dataset("temp")
table = dataset.table("clustering_test")
job_config = bigquery.QueryJobConfig()
job_config.destination = table
job_config.write_disposition = "WRITE_TRUNCATE"

time_partitioning = TimePartitioning()
time_partitioning.field = "date"
job_config.time_partitioning = time_partitioning
job_config.clustering_fields = ["cluster_col"]

sql = """
    SELECT CURRENT_DATE() as date, 1 as cluster_col
"""
query_job = client.query(
    sql,
    location='US',
    job_config=job_config)

query_job.result() 

Code seems very straightforward and also doesn't throw any exceptions.

Is there anything obvious that I'm doing wrong?

3
  • Are you sure that you don't need the entire query (as you wrote in the UI) in your sql String? Normally, in Java, I have to specify all performed actions as well as the project/dataset/table. Commented Mar 6, 2019 at 21:22
  • 1
    Why don't you just run the same query using the client API? Commented Mar 6, 2019 at 23:08
  • it's easier to change the job config programmatically via job_config rather then trying to parse and changing the SQL code directly. That's why their api offers two way of doing it I guess. Commented Mar 7, 2019 at 8:39

1 Answer 1

1

I run your python code and I can confirm it's working as expected with the cluster settings.

The solution for your problem using Python 3.6.7 is to create a clean version and run your code again

Sign up to request clarification or add additional context in comments.

4 Comments

I'm afraid there is no such object like Clustering() in the package. Also as per documentation clustering fields are defined as a list under job_config.clustering_fields directly if I understand it correctly. googleapis.github.io/google-cloud-python/latest/bigquery/…
Tamir, thanks for checking! That's very helpful. At least I know that the code is correct but something wrong with my setup. Out of interest, what python version are you using?
Thank you, I've now created a clean python installation and virtualenv and it works now as well. Thanks a lot for confirming that the code worked for you, it helped to debug into the right direction.
@Dimitri Also important to vote on the answer. Vote up answers that are helpful.... You can check about what to do when someone answers your question - stackoverflow.com/help/someone-answers

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.