Dynamically insert R output in SQL table

Question

I have almost no experience with SQL statements, so I apologize for the potential ignorance of this question. However, let's say I have an SQL table results which has the column fields of b1 b2 b3 b4 and I have R output dat that corresponds to these values which looks like:

print(dat)
b1  b2  b3  b4
7   8   7   1

So I could run an SQL statement that looks something like:

a<-paste("INSERT INTO `results` (`b1`,`b2`,`b3`,`b4`) VALUES ","(",dat$b1,",",dat$b2,",",dat$b3,",",dat$b4",")",";",sep="")
for(i in(1:length(b))){
query(b[i])
}

Which works correctly; however, this is not dynamic, as dat (i.e., the R output) will not always contain all of the column values found in results (i.e., the database column fields), though the output will never have columns that are not found in the data-base column fields (e.g., in this case dat will never have a b5 column). I am trying dynamically write the code so that I don't have to write out all the dat columns and results column name fields in the code, and put dat into results so that regardless of the order, columns in dat will go into the corresponding columns fields in results, finally if a column value is missing in dat an NA will go into the corresponding column field in results. For example if dat looked like:

print(dat)
b4  b1
7   8

results would look like:

b1  b2  b3  b4
8   NA  NA  7

Thank you!

if dat does not contain a column the value returned will be NULL. e.g. dat$b2 and dat$b3 will return NULL, can this be used instead of NA? — Verena Praher
– Verena Praher, Commented Sep 14, 2015 at 17:09
Then your query should work for every situation. Didn't you try that? — Verena Praher
– Verena Praher, Commented Sep 14, 2015 at 17:32
Again, I am trying to do it so I don't have to write out all of column names in dat and all of the column name fields in results so my current code does not solve the issue — costebk08
– costebk08, Commented Sep 14, 2015 at 17:39

hrbrmstr · Accepted Answer · 2015-09-14 17:23:32Z

You can parameterize it in a pretty straightforward manner (and you can wrap the functionality below into a function for easier use):

dat <- mtcars

inserts <- sprintf("INSERT INTO `%s` (%s) VALUES (%s);",
        "results",
        paste(sprintf("`%s`", colnames(dat)), collapse=", "),
        sapply(1:nrow(dat), function(i) {
          paste(sprintf("`%s`", unlist(dat[i,], use.names=FALSE)) , collapse=", ")
        }))

head(inserts)
## [1] "INSERT INTO `results` (`mpg`, `cyl`, `disp`, `hp`, `drat`, `wt`, `qsec`, `vs`, `am`, `gear`, `carb`) VALUES (`21`, `6`, `160`, `110`, `3.9`, `2.62`, `16.46`, `0`, `1`, `4`, `4`);"    
## [2] "INSERT INTO `results` (`mpg`, `cyl`, `disp`, `hp`, `drat`, `wt`, `qsec`, `vs`, `am`, `gear`, `carb`) VALUES (`21`, `6`, `160`, `110`, `3.9`, `2.875`, `17.02`, `0`, `1`, `4`, `4`);"   
## [3] "INSERT INTO `results` (`mpg`, `cyl`, `disp`, `hp`, `drat`, `wt`, `qsec`, `vs`, `am`, `gear`, `carb`) VALUES (`22.8`, `4`, `108`, `93`, `3.85`, `2.32`, `18.61`, `1`, `1`, `4`, `1`);"  
## [4] "INSERT INTO `results` (`mpg`, `cyl`, `disp`, `hp`, `drat`, `wt`, `qsec`, `vs`, `am`, `gear`, `carb`) VALUES (`21.4`, `6`, `258`, `110`, `3.08`, `3.215`, `19.44`, `1`, `0`, `3`, `1`);"
## [5] "INSERT INTO `results` (`mpg`, `cyl`, `disp`, `hp`, `drat`, `wt`, `qsec`, `vs`, `am`, `gear`, `carb`) VALUES (`18.7`, `8`, `360`, `175`, `3.15`, `3.44`, `17.02`, `0`, `0`, `3`, `2`);" 
## [6] "INSERT INTO `results` (`mpg`, `cyl`, `disp`, `hp`, `drat`, `wt`, `qsec`, `vs`, `am`, `gear`, `carb`) VALUES (`18.1`, `6`, `225`, `105`, `2.76`, `3.46`, `20.22`, `1`, `0`, `3`, `1`);"
dat <- iris

inserts <- sprintf("INSERT INTO `%s` (%s) VALUES (%s);",
        "results",
        paste(sprintf("`%s`", colnames(dat)), collapse=", "),
        sapply(1:nrow(dat), function(i) {
          paste(sprintf("`%s`", unlist(dat[i,], use.names=FALSE)) , collapse=", ")
        }))

head(inserts)
## [1] "INSERT INTO `results` (`Sepal.Length`, `Sepal.Width`, `Petal.Length`, `Petal.Width`, `Species`) VALUES (`5.1`, `3.5`, `1.4`, `0.2`, `1`);"
## [2] "INSERT INTO `results` (`Sepal.Length`, `Sepal.Width`, `Petal.Length`, `Petal.Width`, `Species`) VALUES (`4.9`, `3`, `1.4`, `0.2`, `1`);"  
## [3] "INSERT INTO `results` (`Sepal.Length`, `Sepal.Width`, `Petal.Length`, `Petal.Width`, `Species`) VALUES (`4.7`, `3.2`, `1.3`, `0.2`, `1`);"
## [4] "INSERT INTO `results` (`Sepal.Length`, `Sepal.Width`, `Petal.Length`, `Petal.Width`, `Species`) VALUES (`4.6`, `3.1`, `1.5`, `0.2`, `1`);"
## [5] "INSERT INTO `results` (`Sepal.Length`, `Sepal.Width`, `Petal.Length`, `Petal.Width`, `Species`) VALUES (`5`, `3.6`, `1.4`, `0.2`, `1`);"  
## [6] "INSERT INTO `results` (`Sepal.Length`, `Sepal.Width`, `Petal.Length`, `Petal.Width`, `Species`) VALUES (`5.4`, `3.9`, `1.7`, `0.4`, `1`);"

set.seed(1492)
dat <- data.frame(b1=sample(10, 10),
                  b2=sample(10, 10),
                  b3=sample(10, 10),
                  b4=sample(10, 10))

inserts <- sprintf("INSERT INTO `%s` (%s) VALUES (%s);",
        "results",
        paste(sprintf("`%s`", colnames(dat)), collapse=", "),
        sapply(1:nrow(dat), function(i) {
          paste(sprintf("`%s`", unlist(dat[i,], use.names=FALSE)) , collapse=", ")
        }))

head(inserts)
## [1] "INSERT INTO `results` (`b1`, `b2`, `b3`, `b4`) VALUES (`3`, `7`, `7`, `2`);" 
## [2] "INSERT INTO `results` (`b1`, `b2`, `b3`, `b4`) VALUES (`2`, `6`, `4`, `9`);" 
## [3] "INSERT INTO `results` (`b1`, `b2`, `b3`, `b4`) VALUES (`9`, `2`, `2`, `7`);" 
## [4] "INSERT INTO `results` (`b1`, `b2`, `b3`, `b4`) VALUES (`1`, `4`, `5`, `10`);"
## [5] "INSERT INTO `results` (`b1`, `b2`, `b3`, `b4`) VALUES (`7`, `10`, `1`, `6`);"
## [6] "INSERT INTO `results` (`b1`, `b2`, `b3`, `b4`) VALUES (`6`, `9`, `10`, `4`);"

But, there may be more optimal ways of shoving this data back into a database if we knew more abt the problem you're really trying to solve.

This looks like a great solution, but I can't quite get it to work. When I query inserts I get the following error: Error in .local(conn, statement, ...) :could not run statement: Unknown column '-0.0366528160' in 'field list' That value is from the first column, so it looks like it thinks that is the column name. Any thoughts? Thank you
I have no idea what database you're using, how the schema is setup or the SQL syntax it accepts. I was just working with the example you provided.

bramtayl · Accepted Answer · 2015-09-14 18:22:23Z

2

Dunno if you have a huge database, but an easy fix is just to read the dataset into R, append a dataset (for example, using dplyr::bind_rows), and then write the whole thing out again.

library(RMySQL)
library(dplyr)

con = dbConnect(RMySQL::MySQL(), dbname = "test")
con %>%
  dbReadTable("results") %>%
  bind_rows(dat) %>%
  dbWriteTable(con, "results", . , overwrite = TRUE)
dbDisconnect(con)

Or

con %>% dbWriteTable("results", dat, append = TRUE)

To create the table,

con %>% dbWriteTable("results", dat)

edited Sep 14, 2015 at 18:22

answered Sep 14, 2015 at 17:09

bramtayl

4,0242 gold badges13 silver badges20 bronze badges

4 Comments

costebk08 Over a year ago

So how exactly would this look?

costebk08 Over a year ago

Yeah I probably don't want to read the entire database into R every time, and I can't get this solution to work. I am assuming it is because I don't currently have data in the database table, and it looks like this solution assumes data already exists there.

bramtayl Over a year ago

There is also an append option in dbWriteTable but it might not work if you want to add to columns that don't exist yet (definitely try it out though). The first time you write out the table, just do dbWriteTable(con, "results", dat)

costebk08 Over a year ago

Thank you for the info, definitely useful!

Collectives™ on Stack Overflow

Dynamically insert R output in SQL table

2 Answers 2

3 Comments

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related