2

I have some SQL queries that basically parse a dataset by time (POSIXct date format):

library(sqldf)
data_2013 <- sqldf("SELECT * FROM data WHERE strftime('%Y-%m-%d', time,
'unixepoch', 'localtime') >= '2013-01-01' AND strftime('%Y-%m-%d', time,
'unixepoch', 'localtime') <= '2013-12-31'")

data_2012 <- sqldf("SELECT * FROM data WHERE strftime('%Y-%m-%d', time,
'unixepoch', 'localtime') >= '2012-01-01' AND strftime('%Y-%m-%d', time,
'unixepoch', 'localtime') <= '2012-12-31'")

data_2011 <- sqldf("SELECT * FROM data WHERE strftime('%Y-%m-%d', time,
'unixepoch', 'localtime') >= '2011-01-01' AND strftime('%Y-%m-%d', time, 
'unixepoch', 'localtime') <= '2011-12-31'")

However, this code seems very clumsy to me. Is there a neat way of wrapping this up into a function or some other way of making it shorter, while still spitting out the same 3 separate datasets?

2 Answers 2

4

between and fn$ Use between and factor out the strptime expression by prefacing sqldf with fn to perform string interpolation:

Time <- "strftime('%Y-%m-%d', time, 'unixepoch', 'localtime')"
st <- '2013-01-01'
en <- '2013-12-31'
fn$sqldf("select * from data where $Time between '$st' AND '$en' ")

If desired this could readily be made into a function as could the remaining solutions.

Year In the case of a year it can be simplified like this:

Year <- "strftime('%Y', time, 'unixepoch', 'localtime')"
yr <- '2013'    
sql <- "select * from data where $Year = '$yr' "  
fn$sqldf(sql)

We could create a list of data frames like this:

Map(function(yr) fn$sqldf(sql), as.character(2011:2013))

R/sqldf Another possibility is to add a character column in R first:

data$Year <- format(data$time, "%Y")
yr <- '2013'    
sql <- "select * from data where Year = '$yr' "
fn$sqldf(sql)

R Note that its not that hard to do this directly in R:

yr <- "2013"
subset(data, format(time, "%Y") == yr)

Also to split it into a list of data frames, one per year:

split(data, format(data$time, "%Y"))

H2 sqldf can also work with certain other databases. The problem with SQLite is that it has no date/time type but the H2 database directly supports date/times as a type so it greatly simplifies. If sqldf sees that RH2 is loaded it will use it rather than SQLite:

library(RH2)
library(sqldf) 
yr <- 2013
sql <- "select * from data where year(time) = $yr"
fn$sqldf(sql)
Sign up to request clarification or add additional context in comments.

2 Comments

$Year is typo or it is an sqlf function($) to acess .GlobabEnv variables?
$Year signifies string interolation of the variable Year which is defined in the prior line.
2

With paste0 you can achieve this:

sqlfun <- function(startdate,stopdate){
sqldf(paste0("SELECT * FROM data WHERE strftime('%Y-%m-%d', time,
    'unixepoch', 'localtime') >= '",startdate,"' AND strftime('%Y-%m-%d', time,
    'unixepoch', 'localtime') <= '",stopdate,"'"))
}

sqlfun('2013-01-01','2013-12-31')

2 Comments

Is it supposed to be paste or paste0?
paste0 does the same as paste(x,sep="")

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.