1

I'm struggling to transfer SQL query results into R after successfully retrieving data in RStudio's SQL Results tab.

What I've Done So Far

  • Connected to the database via RStudio's "Connections" tab.
  • Executed SQL queries using the SQL editor, successfully previewing results:
-- !preview conn=con
SELECT xdailyissues.issue_date, xdailyissues.drugfull, xdailyissues.quantity_container, 
       xdailyissues.quantity_doseunits, xdailyissues.issue_type, xdailyissues.patient_hospitalno, 
       xdailyissues.costcentre, xdailyissues.lnkdid
FROM %PARALLEL JAC_Super.xdailyissues xdailyissues
WHERE (xdailyissues.issue_date >= {d '2025-04-01'} AND xdailyissues.issue_date <= {d '2025-04-30'})
  • The SQL Results tab shows my query output correctly.
  • However, I can't find a way to transfer these results into an R variable.

Attempts That Didn't Work

I've tried various approaches, including

query <- "SELECT * FROM %PARALLEL JAC_Super.xdailyissues xdailyissues WHERE (xdailyissues.issue_date >= '2025-04-01' AND xdailyissues.issue_date <= '2025-04-30')"
issues <- dbGetQuery(con, query)

… but I keep getting ODBC errors.

Question How can I transfer the SQL query results that are already captured in the SQL Results tab into an R dataframe, without re-executing the query manually?

I want to avoid unnecessary external data refreshes, especially given Power BI’s unpredictable behaviour (see Power BI Loading Large Tables Unnecessarily – How to Prevent Unwanted Refreshes?). Any guidance would be greatly appreciated!

Error messages received:

Error in `dbGetQuery()`:
! ODBC failed with error HY000 from [Iris ODBC][State : HY000][Native Code 400].
✖ 
• [C:\Program Files\RStudio\resources\app\bin\rsession-utf8.exe]
• [SQLCODE: <-400>:<Fatal error occurred>]
• [Error: <<UNDEFINED>Compile+17^%SYS.SQLSRV *%qinfo("type")>]
• [Location: <Prepare>]
• <SQL> '-- !preview conn=con SELECT "xdailyissues"."issue_date", "xdailyissues"."drugfull",
  "xdailyissues"."quantity_container", "xdailyissues"."quantity_doseunits", "xdailyissues"."issue_type",
  "xdailyissues"."patient_hospitalno", "xdailyissues"."costcentre", "xdailyissues"."lnkdid" FROM %PARALLEL
  "JAC_Super"."xdailyissues" "xdailyissues" WHERE ("xdailyissues"."issue_date">={d '2025-04-01'} AND
  "xdailyissues"."issue_date"<={d '2025-04-30'}) '
ℹ From nanodbc/nanodbc.cpp:1726.
Run `rlang::last_trace()` to see where the error occurred.
> query <- "SELECT xdailyissues.issue_date, xdailyissues.drugfull, xdailyissues.quantity_container, xdailyissues.quantity_doseunits, xdailyissues.issue_type, xdailyissues.patient_hospitalno, xdailyissues.costcentre, xdailyissues.lnkdid
+ FROM %PARALLEL JAC_Super.xdailyissues xdailyissues
+ WHERE (xdailyissues.issue_date >= {d '2025-04-01'} AND xdailyissues.issue_date <= {d '2025-04-30'})"
> issues <- dbGetQuery(con, query)
Error: nanodbc/nanodbc.cpp:2856:  2201
[Iris ODBC][State :  22018 ][Native Code 22005]
[C:\Program Files\RStudio\resources\app\bin\rsession-utf8.exe]
Error in assignment 
Warning message:
In dbClearResult(rs) : Result already cleared
> query <- "SELECT * FROM %PARALLEL JAC_Super.xdailyissues xdailyissues WHERE (CAST(xdailyissues.issue_date AS DATE) >= '2025-04-01' AND CAST(xdailyissues.issue_date AS DATE) <= '2025-04-30')"
> issues <- dbGetQuery(con, query)
Error in `dbGetQuery()`:
! ODBC failed with error from .
✖ 
• <SQL> 'SELECT * FROM %PARALLEL JAC_Super.xdailyissues xdailyissues WHERE (CAST(xdailyissues.issue_date AS DATE) >=
  '2025-04-01' AND CAST(xdailyissues.issue_date AS DATE) <= '2025-04-30')'
ℹ From nanodbc/nanodbc.cpp:1726.
Run `rlang::last_trace()` to see where the error occurred.

Please note that I managed to pull a table with the following code in RStudio before. I wasn't able to repeat this, however, and received the error messages above with my later attempts.

    query <- "SELECT \"xdailyissues\".\"issue_date\", \"xdailyissues\".\"drugfull\", \"xdailyissues\".\"quantity_container\", \"xdailyissues\".\"quantity_doseunits\", \"xdailyissues\".\"issue_type\", \"xdailyissues\".\"patient_hospitalno\", \"xdailyissues\".\"costcentre\", \"xdailyissues\".\"lnkdid\"
+   FROM %PARALLEL \"JAC_Super\".\"xdailyissues\" \"xdailyissues\"
+   WHERE (\"xdailyissues\".\"issue_date\" >= {d '2024-04-01'} AND \"xdailyissues\".\"issue_date\" <= {d '2025-03-31'}) 
+   AND (\"xdailyissues\".\"costcentre\" LIKE 'Q4A%' OR \"xdailyissues\".\"costcentre\" LIKE 'Q4C%' OR \"xdailyissues\".\"costcentre\" LIKE 'Q4D%') 
+   AND (\"xdailyissues\".\"drugfull\" LIKE 'Zopiclone%' OR \"xdailyissues\".\"drugfull\" LIKE 'Zolpidem%')
+ "
5
  • 1
    Could you edit your question and include DBI / ODBC errors? SQL preview fetches limited number of records - github.com/rstudio/rstudio/blob/main/src/cpp/session/modules/… - while dbGetQuery fetches everything. Commented May 3 at 11:20
  • (1) "but I keep getting ODBC errors", there are a lot of possible errors, some from ODBC (system), some from odbc (R package), some from DBI (R package). Please include the literal text of warnings and errors, I don't want to try to guess. (2) It seems unlikely this is related to the RStudio IDE. Unless you can demonstrate that it works in Rgui and not in RStudio, the rstudio tag is inappropriate. Similarly for powerbi: if you can demonstrate things work in R/RStudio and they don't work in PBI, then the tag is appropriate. Commented May 3 at 16:53
  • My assumption is that your con is not formed correctly in R, but not knowing the rest of your PBI doc, it's a little hard to know for sure. Commented May 3 at 16:54
  • 1
    I now added some of the error messages in RStudio I received above. Please note that I managed to pull data a few times with dbGetQuery, yet not consistently. I also managed to use the same SQL code successfully with PowerBI. Both PowerBI and RStudio use the exact same ODBC user DSN. Commented May 3 at 20:15
  • Is it possible that some rows of data are malformed (or at least surprising to the code)? For example, maybe you're casting missing values or something. That would explain why you can sometimes get data. Try making the SQL query take a very tiny number of rows, or only select one reliable column. Commented May 4 at 0:30

1 Answer 1

0

I successfully pulled large tables from an InterSystems IRIS database using the following R code:

chunkLoader <- function(sql_query, chunk_size = 100000, 
                              connection = con) {
  
  # Step 1: Modify SQL query to count total rows
  countQuery <- paste("SELECT COUNT(*) AS row_count FROM (", 
                      sql_query, 
                      ") AS subquery")
  
  # Get total number of rows
  queryTotalRows <- dbSendQuery(connection, countQuery)
  totalRowsDataFrame <- dbFetch(queryTotalRows)
  totalRows <- totalRowsDataFrame$row_count |> as.numeric()
  
  if (is.na(totalRows) || totalRows == 0) {
    stop("No rows found or unable to retrieve row count.")
  }
  
  # Step 2: Initialise query execution
  issuesDirectQuery <- dbSendQuery(con, sql_query)
  
  allData <- list()
  maxIterations <- ceiling(totalRows / chunk_size)
  
  # Step 3: Fetch data in chunks
  for (iteration in seq_len(maxIterations)) {
    remaining_rows <- totalRows - ((iteration - 1) * chunk_size)
    fetch_size <- min(chunk_size, remaining_rows)
    
    # Display progress message in the console:
    cat(sprintf("\rRetrieving chunk %d of %d ...", iteration, maxIterations))
    flush.console()
    
    chunk <- dbFetch(issuesDirectQuery, n = fetch_size)
    allData[[length(allData) + 1]] <- chunk
  }
  
  # Step 4: Combine all chunks into a single data frame
  final_data <- bind_rows(allData)
  
  # Step 5: Clean up
  dbClearResult(issuesDirectQuery)
  
  return(final_data)
}

The key challenge was handling the final chunk of data retrieval. If the number of remaining rows is smaller than the specified chunk size (n in dbFetch()), an error occurs. To avoid this, the function first determines the total number of rows using a COUNT(*) query, then retrieves data in manageable chunks.

While the approach works reliably for millions of rows, it is slow. Nonetheless, it represents a significant improvement over previous attempts.

Sign up to request clarification or add additional context in comments.

1 Comment

Do you really need to extract millions of rows over and over again? Can't you just extract the rows that are new or might have changed?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.