Python GAE - How to export data from a backup to Big Query programmatically?

Question

I've been googling for a long time and I didn't find a way export my backups (within buckets) to Big Query without doing it manually...

Is it possible to do this?

Thanks a lot!

Community · Accepted Answer · 2017-05-23 11:52:44Z

You should be able to do so via the python-bigquery api.

First you need to make the connection to BigQuery service. Here's the code I use to do so:

class BigqueryAdapter(object):
    def __init__(self, **kwargs):
        self._project_id = kwargs['project_id']
        self._key_filename = kwargs['key_filename']
        self._account_email = kwargs['account_email']
        self._dataset_id = kwargs['dataset_id']
        self.connector = None
        self.start_connection()

    def start_connection(self):
        key = None
        with open(self._key_filename) as key_file:
            key = key_file.read()
        credentials = SignedJwtAssertionCredentials(self._account_email,
                                                    key,
                                                    ('https://www.googleapis' +
                                                     '.com/auth/bigquery'))
        authorization = credentials.authorize(httplib2.Http())
        self.connector = build('bigquery', 'v2', http=authorization)

After that you can run jobs using self.connector (in this answer you will find a few examples).

To get backups from Google Cloud Storage you would have to define the configuration like so:

body = "configuration": {
  "load": {
    "sourceFormat": #Either "CSV", "DATASTORE_BACKUP", "NEWLINE_DELIMITED_JSON" or "AVRO".
    "fieldDelimiter": "," #(if it's comma separated)
    "destinationTable": {
      "projectId": #your_project_id
      "tableId": #your_table_to_save_the_data
      "datasetId": #your_dataset_id
    },
    "writeDisposition": #"WRITE_TRUNCATE" or "WRITE_APPEND"
    "sourceUris": [
        #the path to your backup in google cloud storage. it could be something like "'gs://bucket_name/filename*'. Notice you can use the '*' operator.
    ],
    "schema": { # [Optional] The schema for the destination table. The schema can be omitted if the destination table already exists, or if you're loading data from Google Cloud Datastore.
      "fields": [ # Describes the fields in a table.
        {
          "fields": [ # [Optional] Describes the nested schema fields if the type property is set to RECORD.
            # Object with schema name: TableFieldSchema
          ],
          "type": "A String", # [Required] The field data type. Possible values include STRING, BYTES, INTEGER, FLOAT, BOOLEAN, TIMESTAMP or RECORD (where RECORD indicates that the field contains a nested schema).
          "description": "A String", # [Optional] The field description. The maximum length is 16K characters.
          "name": "A String", # [Required] The field name. The name must contain only letters (a-z, A-Z), numbers (0-9), or underscores (_), and must start with a letter or underscore. The maximum length is 128 characters.
          "mode": "A String", # [Optional] The field mode. Possible values include NULLABLE, REQUIRED and REPEATED. The default value is NULLABLE.
        },
      ],
    },
  },

And then run:

self.connector.jobs().insert(body=body).execute()

Hopefully that's what you were looking for. Let us know if you run into any issues.

Collectives™ on Stack Overflow

Python GAE - How to export data from a backup to Big Query programmatically?

1 Answer 1

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related