We are facing one intermittent issue where when we execute a query though BigQuery Java API then the number of rows that we get doesn't match with when we execute the same query through BigQuery UI.
In our code, we are using QueryResponse object for executing a query and we also check whether query is completed or not by checking the flag
GetQueryResultsResponse.getJobComplete(), we also have mechanism to pull more records if the query is not returning all rows in one short while(queryResult.getRows() != null && queryResult.getTotalRows().compareTo(BigInteger.valueOf((queryResult.getRows().size()))) > 0) {
Following is the piece of code which we use for executing the query:
int retryCount = 0;
long waitTime = Constant.BASE_WAIT_TIME;
Bigquery bigquery = cloudPlatformConnector.connectBQ();
QueryRequest queryRequest = new QueryRequest();
queryRequest.setUseLegacySql(useLegacyDialect);
GetQueryResultsResponse queryResult = null;
GetQueryResultsResponse queryPaginationResult = null;
String pageToken;
do{
try{
QueryResponse query = bigquery.jobs().query(this.projectId, queryRequest.setQuery(querySql)).execute();
queryResult = bigquery.jobs().getQueryResults(query.getJobReference().getProjectId(), query.getJobReference().getJobId()).execute();
if(queryResult != null ){
if(!queryResult.getJobComplete()){
LOGGER.info("JobId for the query : "+ query.getJobReference().getJobId() + " is Job Completed : "+ queryResult.getJobComplete());
if(queryResult.getErrors() != null){
for( ErrorProto err: queryResult.getErrors() ){
LOGGER.info("Errors in query, Reason : "+ err.getReason()+ " Location : "+ err.getLocation() +" Message : "+ err.getMessage());
}
}
LOGGER.info("Query not completed : "+querySql);
throw new IOException("Query is failing retrying it");
}
}
LOGGER.info("JobId for the query : "+ query.getJobReference().getJobId() + " is Job Completed : "+ queryResult.getJobComplete() + " Total rows from query : " + queryResult.getTotalRows());
pageToken = queryResult.getPageToken();
while(queryResult.getRows() != null && queryResult.getTotalRows().compareTo(BigInteger.valueOf((queryResult.getRows().size()))) > 0) {
LOGGER.info("Inside the Pagination code block, Page Token : "+pageToken);
queryPaginationResult = bigquery.jobs().getQueryResults(projectId,query.getJobReference().getJobId()).setPageToken(pageToken).setStartIndex(BigInteger.valueOf(queryResult.getRows().size())).execute();
queryResult.getRows().addAll(queryPaginationResult.getRows());
pageToken = queryPaginationResult.getPageToken();
LOGGER.info("Inside the Pagination code block, total size : "+ queryResult.getTotalRows() + " Current Size : "+ queryResult.getRows().size());
}
}catch(IOException ex){
retryCount ++;
LOGGER.info("BQ Connection Attempt "+retryCount +" failed, Retrying in " + waitTime + " seconds");
if (retryCount == Constant.MAX_RETRY_LIMIT) {
LOGGER.info("BQ Connection Error", ex);
throw ex;
}
try {
Thread.sleep(waitTime);
} catch (InterruptedException e) {
LOGGER.info("Thread Error");
}
waitTime *= 2;
}
}while((queryResult == null && retryCount < Constant.MAX_RETRY_LIMIT ) || (!queryResult.getJobComplete() && retryCount < Constant.MAX_RETRY_LIMIT));
return queryResult.getRows();
The Query in which I am not getting all rows doesn't have any limit clause in it.
Currently we are using 0.5.0 version of google-cloud-bigquery.
Thanks in Advance!