replace bigquery partition with data staged in bigquery table using python

Question

I have 30 days of data staged in a daily partitioned table in bigquery. I have a larger table with 5 years of data partitioned daily. I need to select from the stage table and replace the entire contents of the existing partitions in the larger table for the 30 days that are in my staging table. My preference is to do this using Python and not extracting the data to a csv first and then loading it back to BQ if I can avoid that. Any suggestions? Thanks in advance.

I have it working by extracting to csv stored in cloud storage and then loading each csv (daily files) using write truncate with the tablename+partition. But it seems crazy that I have to pull the data out of BQ, store it in GCS, and then load back into BQ. I am looking for a way to do this all within BQ but have not found any ideas on a method to do so. — Leo Voskamp
– Leo Voskamp, Commented Nov 17, 2018 at 0:14

AlienDeg · Accepted Answer · 2018-11-17 09:55:52Z

2

All you need to do is query what you need and set destination table for your query.

from google.cloud import bigquery
client = bigquery.Client()
query = """\
SELECT firstname + ' ' + last_name AS full_name,
       FLOOR(DATEDIFF(CURRENT_DATE(), birth_date) / 365) AS age
 FROM dataset_name.persons
"""
dataset = client.dataset('dataset_name')
table = dataset.table(name='person_ages')
job = client.run_async_query('fullname-age-query-job', query)
job.destination = table
job.write_disposition= 'truncate'
job.begin()

answered Nov 17, 2018 at 9:55

AlienDeg

1,3893 gold badges15 silver badges30 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Leo Voskamp Over a year ago

That did not actually work for me but I do think it is correct, albeit for an older version of the big query client library. Your answer did help tremendously and I will accept. I am using most up to date library. The folli

Leo Voskamp · Accepted Answer · 2018-11-29 09:41:43Z

0

That did not actually work for me but I do think it is correct, albeit for an older version of the big query client library. Your answer did help tremendously and I will accept. I am using most up to date library. The following worked for me:

for partition in gbq.list_partitions(stage_table_ref):
    table_partition = table_name+'$'+partition
    stage_partition = stage_dataset.table(table_partition)
    target_partition = target_dataset.table(table_partition)
    job_config = bigquery.CopyJobConfig()
    job_config.write_disposition = bigquery.WriteDisposition.WRITE_TRUNCATE   
    gbq.copy_table(stage_partition, target_partition,job_config = job_config)

answered Nov 29, 2018 at 9:41

Leo Voskamp

131 bronze badge

Collectives™ on Stack Overflow

replace bigquery partition with data staged in bigquery table using python

2 Answers 2

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related