8

I'm doing some analysis on a large volume of data stored in a PostgreSQL database. For speed and memory reasons I'm using the data.table package. Currently I'm doing this to read the data.

library(RPostgreSQL)
library(data.table)
...
query <- "SELECT * FROM eqtl"
data <- as.data.table(dbGetQuery(con, query))

I'm wondering if there is a better way to do this that doesn't involve reading the whole thing into a data.frame and then copying it into a data.table.

2
  • 6
    Perhaps not an answer you're looking for (?), but if you've already read it as a data.frame, you can use setDT() to convert it to data.table by reference (the conversion should be instant from data.frame to data.table) - i.e., data <- dbGetQuery(...); setDT(data). Commented Feb 6, 2015 at 21:43
  • @Arun thanks, this is probably what I'll end up doing, although I would have liked to remove data.frames from the equation altogether. Commented Feb 6, 2015 at 22:12

1 Answer 1

2

As Arun pointed out in the comment you can just use setDT on dbGetQuery results.

Additionally there is a helper function available in my package dwtools which extends this feature for auto setkey when needed. This was designed to be useful when chaining. It also unifies the interface to other database vendors so you can chain data.table using different databases.
The simple select usage will looks like:

my_dt = db("SELECT * FROM eqtl")
# to setkey use
db("SELECT * FROM eqtl", key="mykeycol")

Heavily extended example from package manual:

jj_aggr = quote(list(amount=sum(amount), value=sum(value)))
r <- db("sales",key="geog_code" # read fact table from db
        )[,eval(jj_aggr),keyby=c("geog_code","time_code") # aggr by geog_code and time_code
          ][,db(.SD) # write to db, auto.table.name
            ][,db("geography",key="geog_code" # read lookup geography dim from db
                  )[.SD # left join geography
                    ][,eval(jj_aggr), keyby=c("time_code","geog_region_name")] # aggr
              ][,db(.SD) # write to db, auto.table.name
                ][,db("time",key="time_code" # read lookup time dim from db
                      )[.SD # left join time
                        ][, eval(jj_aggr), keyby=c("geog_region_name","time_month_code","time_month_name")] # aggr
                  ][,db(.SD) # write to db, auto.table.name
                    ]

It would read data from multiple databases, joins, aggregates, save intermediate results to multiple databases.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.