I'm trying to apply a function to all rows of a data.table while using multiple columns as inputs with an output that could be one or two rows of a data.frame/matrix/what-have-you per row. My data.table has 800,000 rows.
Here is my closest attempt. The things that are at play here are of course correctness, efficiency, and ease of use with the output structure.
library(data.table)
d0 = as.Date("2014/01/01")
sdays = seq(d0,d0+99,by=1)
gg=data.table(id=1:100,event_date = sdays)
setkey(gg, id)
test_func = function(id,day){
delta = day - d0
if(delta == 0 ){
rcomb = c(id, 0, 100, 1,0)
} else if(delta != 100 ){
r1 = c(id, 0, delta, 0, 0)
r2 = c(id, delta, 100, 1, 0)
rcomb = rbind(r1,r2)
}
rcomb
}
att = gg[, test_func( get("id"), get("event_date")), by=id]
att
Any ideas on how to use fast data.table tricks here? I've been at it for hours and haven't gotten much closer :/ As for the output, I would prefer it be a list with one entry per original row so then i could just call do.call and rbind. Thanks!
So let me give an example of the desired output, but in a horribly inefficient way:
some_list = vector("list", 100)
for(i in 1:100) {
some_list[[i]] <- test_func(gg$id[i], gg$event_date[i])
}
happy=do.call(rbind,some_list)
head(happy)
[,1] [,2] [,3] [,4] [,5]
1 0 100 1 0
r1 2 0 1 0 0
r2 2 1 100 1 0
r1 3 0 2 0 0
r2 3 2 100 1 0
r1 4 0 3 0 0
gethere .gg[,test_func(id, event_date), id]head(att,n=20)you'll notice the irregular pattern that joins the last parts of each vectors sequentially. This is also a problem because in the data one cannot be sure if there will be 1-row or 2-row output. Edit: In response to transpose comment