3

I have a 50GB SQLite database file and I want to calculate and add new variables. Can you leverage Moody_Mudskipper's function or something using ALTER TABLE and UPDATE to create the variable instead of an entire table?

library(dbplyr)
    library(DBI)
    con <- DBI::dbConnect(RSQLite::SQLite(), path = ":memory:")
    copy_to(con, head(iris,3),"iris")

create <- function(data,name){
   DBI::dbSendQuery(data$src$con,
                    paste("CREATE TABLE", name,"AS", dbplyr::sql_render(data)))
                             }

tbl(con,"iris") %>% 
   mutate(Sepal.Area= Sepal.Length * Sepal.Width) %>% 
   create("iris_2")
2
  • 1
    I hadn't seen this but meanwhile updated my answer there: stackoverflow.com/questions/50766908/… and it seems to work quite nicely Commented Jun 17, 2018 at 11:33
  • Ok, thank you Moody_Mudskipper! Commented Jun 17, 2018 at 16:32

1 Answer 1

5

Here is an easy solution using pool and DBI since you can directly write and execute any valid SQL statement.

library(DBI)
library(pool)
library(RSQLite)

#Database
#Create a connection
pool <- dbPool(drv =RSQLite::SQLite(),dbname="")
#Colse open connection when it is no longer needed, to prevent leaks
poolClose(pool)


dbWriteTable(conn = pool, name = "mtcars", value = mtcars)

insert_new_column <- function(table_name, column_name, column_type){
        query <- sqlInterpolate(pool,"ALTER TABLE ?table ADD COLUMN ?column ?type;",table=table_name, column=column_name, type=column_type)

        dbExecute(pool,query)

}

insert_new_column(table_name="mtcars", column_name="test", column_type="REAL")  #More types: INTEGER, TEXT

dbGetQuery(pool,"Select * from mtcars;")

head(dbReadTable(pool, "mtcars"))
   mpg cyl disp  hp drat    wt  qsec vs am gear carb test
1 21.0   6  160 110 3.90 2.620 16.46  0  1    4    4  NA
2 21.0   6  160 110 3.90 2.875 17.02  0  1    4    4  NA
3 22.8   4  108  93 3.85 2.320 18.61  1  1    4    1  NA
4 21.4   6  258 110 3.08 3.215 19.44  1  0    3    1  NA
5 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2  NA
6 18.1   6  225 105 2.76 3.460 20.22  1  0    3    1  NA

Then you can do:

tbl(pool,"mtcars") %>% 
    mutate(test= cyl * 2)

Update

con <- dbConnect(RSQLite::SQLite(), ":memory:")

dbWriteTable(con, "mtcars", mtcars)
insert_new_column2 <- function(table_name, column_name, column_type){
      dbWithTransaction(
            con,{
                   dbExecute(con, sqlInterpolate(con,"ALTER TABLE ?table ADD COLUMN ?column ?type;",table=table_name, column=column_name, type=column_type))

                   dbExecute(con, sqlInterpolate(con,"UPDATE ?table SET ?column = cyl*2;",table=table_name, column=column_name))

  }) 
 }
Sign up to request clarification or add additional context in comments.

6 Comments

When I close the connection and reconnect, the test shows NAs again. The mutate is not updating the values permanently. It gets updated with dbSendQuery(con, "UPDATE mtcars SET test=cyl*2"), but I need to do that through the function call. Can you help with that?
In fact, instead of UPDATE, can you try and make the effect of that mutate command permanent on the test variable?
I think mutate need an assignment to a new data frame as it does the calculation on the fly. Meanwhile, I add a new function for Update.
Wow...This is AMAZING! Sorry for stretching this, but can the UPDATE include SQL equivalent of dplyr's "mtcars %>% group_by(carb, vs, am) %>% mutate(test= mean(disp, na.rm = T))"
tbl(pool,"mtcars") %>% group_by(carb, vs, am) %>% summarise(test= mean(disp, na.rm = T)) %>% show_query() helped!
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.