0

I've been googling for a long time and I didn't find a way export my backups (within buckets) to Big Query without doing it manually...

Is it possible to do this?

Thanks a lot!

1 Answer 1

0

You should be able to do so via the python-bigquery api.

First you need to make the connection to BigQuery service. Here's the code I use to do so:

class BigqueryAdapter(object):
    def __init__(self, **kwargs):
        self._project_id = kwargs['project_id']
        self._key_filename = kwargs['key_filename']
        self._account_email = kwargs['account_email']
        self._dataset_id = kwargs['dataset_id']
        self.connector = None
        self.start_connection()

    def start_connection(self):
        key = None
        with open(self._key_filename) as key_file:
            key = key_file.read()
        credentials = SignedJwtAssertionCredentials(self._account_email,
                                                    key,
                                                    ('https://www.googleapis' +
                                                     '.com/auth/bigquery'))
        authorization = credentials.authorize(httplib2.Http())
        self.connector = build('bigquery', 'v2', http=authorization)

After that you can run jobs using self.connector (in this answer you will find a few examples).

To get backups from Google Cloud Storage you would have to define the configuration like so:

body = "configuration": {
  "load": {
    "sourceFormat": #Either "CSV", "DATASTORE_BACKUP", "NEWLINE_DELIMITED_JSON" or "AVRO".
    "fieldDelimiter": "," #(if it's comma separated)
    "destinationTable": {
      "projectId": #your_project_id
      "tableId": #your_table_to_save_the_data
      "datasetId": #your_dataset_id
    },
    "writeDisposition": #"WRITE_TRUNCATE" or "WRITE_APPEND"
    "sourceUris": [
        #the path to your backup in google cloud storage. it could be something like "'gs://bucket_name/filename*'. Notice you can use the '*' operator.
    ],
    "schema": { # [Optional] The schema for the destination table. The schema can be omitted if the destination table already exists, or if you're loading data from Google Cloud Datastore.
      "fields": [ # Describes the fields in a table.
        {
          "fields": [ # [Optional] Describes the nested schema fields if the type property is set to RECORD.
            # Object with schema name: TableFieldSchema
          ],
          "type": "A String", # [Required] The field data type. Possible values include STRING, BYTES, INTEGER, FLOAT, BOOLEAN, TIMESTAMP or RECORD (where RECORD indicates that the field contains a nested schema).
          "description": "A String", # [Optional] The field description. The maximum length is 16K characters.
          "name": "A String", # [Required] The field name. The name must contain only letters (a-z, A-Z), numbers (0-9), or underscores (_), and must start with a letter or underscore. The maximum length is 128 characters.
          "mode": "A String", # [Optional] The field mode. Possible values include NULLABLE, REQUIRED and REPEATED. The default value is NULLABLE.
        },
      ],
    },
  },

And then run:

self.connector.jobs().insert(body=body).execute()

Hopefully that's what you were looking for. Let us know if you run into any issues.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.