0

We are facing one intermittent issue where when we execute a query though BigQuery Java API then the number of rows that we get doesn't match with when we execute the same query through BigQuery UI.

In our code, we are using QueryResponse object for executing a query and we also check whether query is completed or not by checking the flag GetQueryResultsResponse.getJobComplete(), we also have mechanism to pull more records if the query is not returning all rows in one short while(queryResult.getRows() != null && queryResult.getTotalRows().compareTo(BigInteger.valueOf((queryResult.getRows().size()))) > 0) {

Following is the piece of code which we use for executing the query:

int retryCount = 0;
    long waitTime = Constant.BASE_WAIT_TIME;
    Bigquery bigquery = cloudPlatformConnector.connectBQ();
    QueryRequest queryRequest = new QueryRequest();
    queryRequest.setUseLegacySql(useLegacyDialect);
    GetQueryResultsResponse queryResult = null;
    GetQueryResultsResponse queryPaginationResult = null;
    String pageToken;
    do{
         try{
               QueryResponse query = bigquery.jobs().query(this.projectId, queryRequest.setQuery(querySql)).execute();
               queryResult = bigquery.jobs().getQueryResults(query.getJobReference().getProjectId(), query.getJobReference().getJobId()).execute();                   
               if(queryResult != null ){
                  if(!queryResult.getJobComplete()){
                      LOGGER.info("JobId for the query : "+ query.getJobReference().getJobId() + " is Job Completed : "+ queryResult.getJobComplete());
                      if(queryResult.getErrors() != null){
                           for( ErrorProto err: queryResult.getErrors() ){
                               LOGGER.info("Errors in query, Reason : "+ err.getReason()+ " Location : "+ err.getLocation() +" Message : "+ err.getMessage());
                           }  
                      }
                       LOGGER.info("Query not completed : "+querySql);
                       throw new IOException("Query is failing retrying it");
                   }
               }
               LOGGER.info("JobId for the query : "+ query.getJobReference().getJobId() + " is Job Completed : "+ queryResult.getJobComplete() + " Total rows from query : " + queryResult.getTotalRows());
               pageToken = queryResult.getPageToken();
               while(queryResult.getRows() != null && queryResult.getTotalRows().compareTo(BigInteger.valueOf((queryResult.getRows().size()))) > 0) {
                   LOGGER.info("Inside the Pagination code block, Page Token : "+pageToken);
                   queryPaginationResult =  bigquery.jobs().getQueryResults(projectId,query.getJobReference().getJobId()).setPageToken(pageToken).setStartIndex(BigInteger.valueOf(queryResult.getRows().size())).execute();
                   queryResult.getRows().addAll(queryPaginationResult.getRows());
                   pageToken = queryPaginationResult.getPageToken();
                   LOGGER.info("Inside the Pagination code block, total size : "+ queryResult.getTotalRows() + " Current Size : "+ queryResult.getRows().size());
               }

         }catch(IOException ex){
               retryCount ++;
               LOGGER.info("BQ Connection Attempt "+retryCount +" failed, Retrying in " + waitTime + " seconds");
               if (retryCount == Constant.MAX_RETRY_LIMIT) {
                    LOGGER.info("BQ Connection Error", ex);
                    throw ex;
               }
               try {
                    Thread.sleep(waitTime);
               } catch (InterruptedException e) {
                    LOGGER.info("Thread Error");
               }
               waitTime *= 2;
         }
    }while((queryResult == null && retryCount < Constant.MAX_RETRY_LIMIT ) || (!queryResult.getJobComplete() && retryCount < Constant.MAX_RETRY_LIMIT));
    return queryResult.getRows();

The Query in which I am not getting all rows doesn't have any limit clause in it.

Currently we are using 0.5.0 version of google-cloud-bigquery.

Thanks in Advance!

1 Answer 1

1

I think on subsequent calls of getQueryResults, you need to call setPageToken properly with the pageToken returned from the previous page. Otherwise getQueryResults would just return the rows from the first page.

Sign up to request clarification or add additional context in comments.

7 Comments

Thanks, Nguyen, I tried this thing but no success, I am facing the same problem it looks like my flow never goes into the while block which pulls more records in case of pagination.I have also updated my code above which also considers the pageToken.
I saw that you added .setPageToken(queryResult.getPageToken()) to the while loop. However wouldn't queryResult.getPageToken() always the page token of the first page? I think you need to get the query tokens of every getQueryResults call and put it into the next call. Also, can you provide some more info on: (1) What is the total number of rows that you are expecting, (2) the number of rows fetched by your code, and (3) what is the info logging output from it.
I got what you are saying, I can update my code but I am not seeing any log statements which I entered inside the while loop, which forces me to think this issue is not related to pagination. (1) Total number of rows I am expecting varies on daily basis if we talk about Oct 14 then I was expecting 3978 but got only 3972 the difference is not always so close sometimes I see a difference of more than 500 rows as well
Based on your numbers, queryResult.getTotalRows() should return 3978, queryResult.getRows().size() should return 3972. 3978 > 3972 so it should have entered the while loop, right? (since the condition is satisfied). Are my understanding correct or did I miss something?
See I am not logging queryResult.getTotalRows() anywhere in my code the 3978 rows I was getting when I executed the same query through bigQuery UI but my code was returning only 3972 records, the reason why I am saying my code flow is not entering the while loop because when I faced this issue I checked the logs and I didn't see any log statements from the while block.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.