2

I have almost no experience with SQL statements, so I apologize for the potential ignorance of this question. However, let's say I have an SQL table results which has the column fields of b1 b2 b3 b4 and I have R output dat that corresponds to these values which looks like:

print(dat)
b1  b2  b3  b4
7   8   7   1

So I could run an SQL statement that looks something like:

a<-paste("INSERT INTO `results` (`b1`,`b2`,`b3`,`b4`) VALUES ","(",dat$b1,",",dat$b2,",",dat$b3,",",dat$b4",")",";",sep="")
for(i in(1:length(b))){
query(b[i])
}

Which works correctly; however, this is not dynamic, as dat (i.e., the R output) will not always contain all of the column values found in results (i.e., the database column fields), though the output will never have columns that are not found in the data-base column fields (e.g., in this case dat will never have a b5 column). I am trying dynamically write the code so that I don't have to write out all the dat columns and results column name fields in the code, and put dat into results so that regardless of the order, columns in dat will go into the corresponding columns fields in results, finally if a column value is missing in dat an NA will go into the corresponding column field in results. For example if dat looked like:

print(dat)
b4  b1
7   8

results would look like:

b1  b2  b3  b4
8   NA  NA  7

Thank you!

4
  • if dat does not contain a column the value returned will be NULL. e.g. dat$b2 and dat$b3 will return NULL, can this be used instead of NA? Commented Sep 14, 2015 at 17:09
  • Yes NULL is completely fine. Commented Sep 14, 2015 at 17:15
  • Then your query should work for every situation. Didn't you try that? Commented Sep 14, 2015 at 17:32
  • Again, I am trying to do it so I don't have to write out all of column names in dat and all of the column name fields in results so my current code does not solve the issue Commented Sep 14, 2015 at 17:39

2 Answers 2

2

You can parameterize it in a pretty straightforward manner (and you can wrap the functionality below into a function for easier use):

dat <- mtcars

inserts <- sprintf("INSERT INTO `%s` (%s) VALUES (%s);",
        "results",
        paste(sprintf("`%s`", colnames(dat)), collapse=", "),
        sapply(1:nrow(dat), function(i) {
          paste(sprintf("`%s`", unlist(dat[i,], use.names=FALSE)) , collapse=", ")
        }))

head(inserts)
## [1] "INSERT INTO `results` (`mpg`, `cyl`, `disp`, `hp`, `drat`, `wt`, `qsec`, `vs`, `am`, `gear`, `carb`) VALUES (`21`, `6`, `160`, `110`, `3.9`, `2.62`, `16.46`, `0`, `1`, `4`, `4`);"    
## [2] "INSERT INTO `results` (`mpg`, `cyl`, `disp`, `hp`, `drat`, `wt`, `qsec`, `vs`, `am`, `gear`, `carb`) VALUES (`21`, `6`, `160`, `110`, `3.9`, `2.875`, `17.02`, `0`, `1`, `4`, `4`);"   
## [3] "INSERT INTO `results` (`mpg`, `cyl`, `disp`, `hp`, `drat`, `wt`, `qsec`, `vs`, `am`, `gear`, `carb`) VALUES (`22.8`, `4`, `108`, `93`, `3.85`, `2.32`, `18.61`, `1`, `1`, `4`, `1`);"  
## [4] "INSERT INTO `results` (`mpg`, `cyl`, `disp`, `hp`, `drat`, `wt`, `qsec`, `vs`, `am`, `gear`, `carb`) VALUES (`21.4`, `6`, `258`, `110`, `3.08`, `3.215`, `19.44`, `1`, `0`, `3`, `1`);"
## [5] "INSERT INTO `results` (`mpg`, `cyl`, `disp`, `hp`, `drat`, `wt`, `qsec`, `vs`, `am`, `gear`, `carb`) VALUES (`18.7`, `8`, `360`, `175`, `3.15`, `3.44`, `17.02`, `0`, `0`, `3`, `2`);" 
## [6] "INSERT INTO `results` (`mpg`, `cyl`, `disp`, `hp`, `drat`, `wt`, `qsec`, `vs`, `am`, `gear`, `carb`) VALUES (`18.1`, `6`, `225`, `105`, `2.76`, `3.46`, `20.22`, `1`, `0`, `3`, `1`);"
dat <- iris

inserts <- sprintf("INSERT INTO `%s` (%s) VALUES (%s);",
        "results",
        paste(sprintf("`%s`", colnames(dat)), collapse=", "),
        sapply(1:nrow(dat), function(i) {
          paste(sprintf("`%s`", unlist(dat[i,], use.names=FALSE)) , collapse=", ")
        }))

head(inserts)
## [1] "INSERT INTO `results` (`Sepal.Length`, `Sepal.Width`, `Petal.Length`, `Petal.Width`, `Species`) VALUES (`5.1`, `3.5`, `1.4`, `0.2`, `1`);"
## [2] "INSERT INTO `results` (`Sepal.Length`, `Sepal.Width`, `Petal.Length`, `Petal.Width`, `Species`) VALUES (`4.9`, `3`, `1.4`, `0.2`, `1`);"  
## [3] "INSERT INTO `results` (`Sepal.Length`, `Sepal.Width`, `Petal.Length`, `Petal.Width`, `Species`) VALUES (`4.7`, `3.2`, `1.3`, `0.2`, `1`);"
## [4] "INSERT INTO `results` (`Sepal.Length`, `Sepal.Width`, `Petal.Length`, `Petal.Width`, `Species`) VALUES (`4.6`, `3.1`, `1.5`, `0.2`, `1`);"
## [5] "INSERT INTO `results` (`Sepal.Length`, `Sepal.Width`, `Petal.Length`, `Petal.Width`, `Species`) VALUES (`5`, `3.6`, `1.4`, `0.2`, `1`);"  
## [6] "INSERT INTO `results` (`Sepal.Length`, `Sepal.Width`, `Petal.Length`, `Petal.Width`, `Species`) VALUES (`5.4`, `3.9`, `1.7`, `0.4`, `1`);"

set.seed(1492)
dat <- data.frame(b1=sample(10, 10),
                  b2=sample(10, 10),
                  b3=sample(10, 10),
                  b4=sample(10, 10))

inserts <- sprintf("INSERT INTO `%s` (%s) VALUES (%s);",
        "results",
        paste(sprintf("`%s`", colnames(dat)), collapse=", "),
        sapply(1:nrow(dat), function(i) {
          paste(sprintf("`%s`", unlist(dat[i,], use.names=FALSE)) , collapse=", ")
        }))

head(inserts)
## [1] "INSERT INTO `results` (`b1`, `b2`, `b3`, `b4`) VALUES (`3`, `7`, `7`, `2`);" 
## [2] "INSERT INTO `results` (`b1`, `b2`, `b3`, `b4`) VALUES (`2`, `6`, `4`, `9`);" 
## [3] "INSERT INTO `results` (`b1`, `b2`, `b3`, `b4`) VALUES (`9`, `2`, `2`, `7`);" 
## [4] "INSERT INTO `results` (`b1`, `b2`, `b3`, `b4`) VALUES (`1`, `4`, `5`, `10`);"
## [5] "INSERT INTO `results` (`b1`, `b2`, `b3`, `b4`) VALUES (`7`, `10`, `1`, `6`);"
## [6] "INSERT INTO `results` (`b1`, `b2`, `b3`, `b4`) VALUES (`6`, `9`, `10`, `4`);"

But, there may be more optimal ways of shoving this data back into a database if we knew more abt the problem you're really trying to solve.

Sign up to request clarification or add additional context in comments.

3 Comments

This looks like a great solution, but I can't quite get it to work. When I query inserts I get the following error: Error in .local(conn, statement, ...) :could not run statement: Unknown column '-0.0366528160' in 'field list' That value is from the first column, so it looks like it thinks that is the column name. Any thoughts? Thank you
I have no idea what database you're using, how the schema is setup or the SQL syntax it accepts. I was just working with the example you provided.
Ok thank you. I will look into the specific syntax.
2

Dunno if you have a huge database, but an easy fix is just to read the dataset into R, append a dataset (for example, using dplyr::bind_rows), and then write the whole thing out again.

library(RMySQL)
library(dplyr)

con = dbConnect(RMySQL::MySQL(), dbname = "test")
con %>%
  dbReadTable("results") %>%
  bind_rows(dat) %>%
  dbWriteTable(con, "results", . , overwrite = TRUE)
dbDisconnect(con)

Or

con %>% dbWriteTable("results", dat, append = TRUE)

To create the table,

con %>% dbWriteTable("results", dat)

4 Comments

So how exactly would this look?
Yeah I probably don't want to read the entire database into R every time, and I can't get this solution to work. I am assuming it is because I don't currently have data in the database table, and it looks like this solution assumes data already exists there.
There is also an append option in dbWriteTable but it might not work if you want to add to columns that don't exist yet (definitely try it out though). The first time you write out the table, just do dbWriteTable(con, "results", dat)
Thank you for the info, definitely useful!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.