1

I've been working on this for hours and can't find a solution that works. For simplicity let us say the setup is that we have a DataFrame in R, let's call it df, with a single column of values and let's say it has values 1,2,3,4 and 5.

I want to embed the following query in R:

 SELECT DISTINCT column1,
                 column 2 
 FROM database 
 WHERE value IN (1,2,3,4,5)

If I embed the query in R via the following

df5 <- dbGetQuery(db,paste0("

SELECT DISTINCT column1,
                 column 2 
 FROM database 
 WHERE value IN (1,2,3,4,5)"))

then my query works, but I want to reference this DataFrame which I am pulling in from an Excel file. The natural thing to do would be to convert it to a list

val_list=list(df$'values')

and do the following

df5 <- dbGetQuery(db,paste0("

SELECT DISTINCT column1,
                 column 2 
 FROM database 
 WHERE value IN '",vals,"))

This is, however, not working. How can I get it to work as I want?

2 Answers 2

1

One should never interpolate data directly into the query, lest accidental sql-injection (or query-poisoning) occurs. It's better to use bound parameters or similar, see https://db.rstudio.com/best-practices/run-queries-safely/.

For this code, assuming you have a set of values you want to check the IN set membership:

#?# vec <- df$values
qmarks <- paste(rep("?", length(vec)), collapse = ",")
df5 <- dbGetQuery(db, paste("
    SELECT DISTINCT column1, column 2 
    FROM database 
    WHERE value IN (", qmarks, ")"),
  params = as.list(vec))
Sign up to request clarification or add additional context in comments.

6 Comments

The advice not to use string interpolation is for situations interfacing to publicly facing sites only. In typical R applications which are more internal computational this advice does not apply.
I disagree. The rationale may be for more practical reasons initially, but the end result is the same. Further, "internal" may feel safe enough until a quick hack turns into a small shiny app shared with a coworker that inadvertently or eventually migrates to "public" (or public-enough). I've seen it happen several times, and when this type of bug comes up, it carries some of the same risks. Doing parameter binding is not hard, why not teach it from day one?
IMO in some cases of adHoc R scripts the parameter concatenation may be considered tolerable but is must be always considered as a potential security and performance thread. BTW (in RJDBC) I had to use do.call to pass mutiple bind parameters, e.g. do.call(dbSendUpdate, c(jdbcConnection, sql, df$col1, df$col2) )
@MarmiteBomber, thanks for the comment. I don't know when "performance" (speed of execution) would be the prime motivator, but I won't argue that inefficient query-formation could easily slow down the query. As for the required use of do.call, I don't see how it would be required using the method in my answer here. If you see problems using bound parameters (even with RJDBC, though I admit to not using JDBC), I'd be curious to know.
You are welcome. The performance problem is basically in the need of parsing on each execution. The concatenated parameters disable the reuse of parsed cursor (in Oracle). The problem with d.call - I got missing IN- or OUT-parameter at index:: 2 while running dbSendUpdate(conn, "update test set val = '0' where col1 in (?,?)", params = as.list(c(1,2)) ) your syntax works for me with only one bind paramater
|
0

Suppose we wish to insert the Time column from the built in data frame BOD into an sql query.

sprintf("select * from X where Y in (%s)", toString(BOD$Time))
## [1] "select * from X where Y in (1, 2, 3, 4, 5, 7)"

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.