I have 30 days of data staged in a daily partitioned table in bigquery. I have a larger table with 5 years of data partitioned daily. I need to select from the stage table and replace the entire contents of the existing partitions in the larger table for the 30 days that are in my staging table. My preference is to do this using Python and not extracting the data to a csv first and then loading it back to BQ if I can avoid that. Any suggestions? Thanks in advance.
2
-
What did you try? What went wrong?Frederik.L– Frederik.L2018-11-17 00:05:18 +00:00Commented Nov 17, 2018 at 0:05
-
I have it working by extracting to csv stored in cloud storage and then loading each csv (daily files) using write truncate with the tablename+partition. But it seems crazy that I have to pull the data out of BQ, store it in GCS, and then load back into BQ. I am looking for a way to do this all within BQ but have not found any ideas on a method to do so.Leo Voskamp– Leo Voskamp2018-11-17 00:14:21 +00:00Commented Nov 17, 2018 at 0:14
Add a comment
|
2 Answers
All you need to do is query what you need and set destination table for your query.
from google.cloud import bigquery
client = bigquery.Client()
query = """\
SELECT firstname + ' ' + last_name AS full_name,
FLOOR(DATEDIFF(CURRENT_DATE(), birth_date) / 365) AS age
FROM dataset_name.persons
"""
dataset = client.dataset('dataset_name')
table = dataset.table(name='person_ages')
job = client.run_async_query('fullname-age-query-job', query)
job.destination = table
job.write_disposition= 'truncate'
job.begin()
1 Comment
Leo Voskamp
That did not actually work for me but I do think it is correct, albeit for an older version of the big query client library. Your answer did help tremendously and I will accept. I am using most up to date library. The folli
That did not actually work for me but I do think it is correct, albeit for an older version of the big query client library. Your answer did help tremendously and I will accept. I am using most up to date library. The following worked for me:
for partition in gbq.list_partitions(stage_table_ref):
table_partition = table_name+'$'+partition
stage_partition = stage_dataset.table(table_partition)
target_partition = target_dataset.table(table_partition)
job_config = bigquery.CopyJobConfig()
job_config.write_disposition = bigquery.WriteDisposition.WRITE_TRUNCATE
gbq.copy_table(stage_partition, target_partition,job_config = job_config)