6

Is there a way to load a JSON file from local file system to BigQuery using Google BigQuery Client API?

All the options I found are:

1- Streaming the records one by one.

2- Loading JSON data from GCS.

3- Using raw POST requests to load the JSON (i.e. not through Google Client API).

1 Answer 1

3

I'm assuming from the python tag that you want to do this from python. There is a load example here that loads data from a local file (it uses CSV, but it is easy to adapt it to JSON... there is another json example in the same directory).

The basic flow is:

# Load configuration with the destination specified.
load_config = {
  'destinationTable': {
    'projectId': PROJECT_ID,
    'datasetId': DATASET_ID,
    'tableId': TABLE_ID
  }
}

load_config['schema'] = {
  'fields': [
    {'name':'string_f', 'type':'STRING'},
    {'name':'boolean_f', 'type':'BOOLEAN'},
    {'name':'integer_f', 'type':'INTEGER'},
    {'name':'float_f', 'type':'FLOAT'},
    {'name':'timestamp_f', 'type':'TIMESTAMP'}
  ]
}
load_config['sourceFormat'] = 'NEWLINE_DELIMITED_JSON'

# This tells it to perform a resumable upload of a local file
# called 'foo.json' 
upload = MediaFileUpload('foo.json',
                         mimetype='application/octet-stream',
                         # This enables resumable uploads.
                         resumable=True)

start = time.time()
job_id = 'job_%d' % start
# Create the job.
result = jobs.insert(
  projectId=project_id,
  body={
    'jobReference': {
      'jobId': job_id
    },
    'configuration': {
      'load': load
    }
  },
  media_body=upload).execute()

 # Then you'd also want to wait for the result and check the status. (check out
 # the example at the link for more info).
Sign up to request clarification or add additional context in comments.

2 Comments

Thanks this worked! Missed the media_body parameter in the doc. It is pretty far down after all the JSON config options for body :)
If you have the json object (100000 json) it will be imported in memory, What is the best option to upload data if I dismiss streaming, loading JSON data from GCS or using raw POST requests?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.