0

I have 30 days of data staged in a daily partitioned table in bigquery. I have a larger table with 5 years of data partitioned daily. I need to select from the stage table and replace the entire contents of the existing partitions in the larger table for the 30 days that are in my staging table. My preference is to do this using Python and not extracting the data to a csv first and then loading it back to BQ if I can avoid that. Any suggestions? Thanks in advance.

2
  • What did you try? What went wrong? Commented Nov 17, 2018 at 0:05
  • I have it working by extracting to csv stored in cloud storage and then loading each csv (daily files) using write truncate with the tablename+partition. But it seems crazy that I have to pull the data out of BQ, store it in GCS, and then load back into BQ. I am looking for a way to do this all within BQ but have not found any ideas on a method to do so. Commented Nov 17, 2018 at 0:14

2 Answers 2

2

All you need to do is query what you need and set destination table for your query.

from google.cloud import bigquery
client = bigquery.Client()
query = """\
SELECT firstname + ' ' + last_name AS full_name,
       FLOOR(DATEDIFF(CURRENT_DATE(), birth_date) / 365) AS age
 FROM dataset_name.persons
"""
dataset = client.dataset('dataset_name')
table = dataset.table(name='person_ages')
job = client.run_async_query('fullname-age-query-job', query)
job.destination = table
job.write_disposition= 'truncate'
job.begin()
Sign up to request clarification or add additional context in comments.

1 Comment

That did not actually work for me but I do think it is correct, albeit for an older version of the big query client library. Your answer did help tremendously and I will accept. I am using most up to date library. The folli
0

That did not actually work for me but I do think it is correct, albeit for an older version of the big query client library. Your answer did help tremendously and I will accept. I am using most up to date library. The following worked for me:

for partition in gbq.list_partitions(stage_table_ref):
    table_partition = table_name+'$'+partition
    stage_partition = stage_dataset.table(table_partition)
    target_partition = target_dataset.table(table_partition)
    job_config = bigquery.CopyJobConfig()
    job_config.write_disposition = bigquery.WriteDisposition.WRITE_TRUNCATE   
    gbq.copy_table(stage_partition, target_partition,job_config = job_config) 

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.