0

all:

The problem came when I tried to use for loop to create 5 dataframes and assign these five dataframes to a list. Please see an example:

library(data.table)
set.seed(123)    
df <- as.data.table(list(rnorm(10,1,1), rnorm(10,1,1)))
list <- list() 
for(i in 2011:2015){
             list[[paste0("A_",i)]] <- df[, year := as.numeric(i)]
             }

So, as I expected, the value of year variable should be as same as i in each element. For example, value of year in list[1] should be 2011. However, the above code returns 2015 for year in all elements:

    > list[1]
$A_2011
            V1         V2 year
 1:  0.4395244  2.2240818 2015
 2:  0.7698225  1.3598138 2015
 3:  2.5587083  1.4007715 2015
 4:  1.0705084  1.1106827 2015
 5:  1.1292877  0.4441589 2015
 6:  2.7150650  2.7869131 2015
 7:  1.4609162  1.4978505 2015
 8: -0.2650612 -0.9666172 2015
 9:  0.3131471  1.7013559 2015
10:  0.5543380  0.5272086 2015

I can't figure out what's wrong with my code. I would appreciate if anyone could point out the problem here. I would like to see any other solutions using lapply or so, if any. Many thanks!

4
  • Quick fix: Add list[[paste0("A_",i)]]$year <- i in your loop. Commented Apr 20, 2017 at 15:11
  • 1
    Generally a lot less painful to keep it in one table, like yrs = 2011:2015; res <- df[rep(1:.N, length(yrs))][, year := rep(yrs, each = nrow(df))][] or df[, rbindlist(Map(cbind, .(.SD), year = yrs))] Commented Apr 20, 2017 at 15:14
  • @Majo The fix works! However, I really want to know why my code does not work as expected. I think the logic is correct there. Commented Apr 20, 2017 at 15:18
  • @Frank Thanks! What a beautiful solution! Commented Apr 20, 2017 at 15:18

2 Answers 2

2
for(i in 2011:2015){
  list[[paste0("A_",i)]] <- df[, year := as.numeric(i)]
} 

I would appreciate if anyone could point out the problem here.

<- makes a pointer to the same data.table, df, instead of making a copy. Wrapping this in copy() should fix that. However, it's cleaner to work with a single big table:

yrs = 2011:2015
res <- df[, rbindlist(Map(cbind, .(.SD), year = yrs))]

This has a couple advantages:

  • Lists of data.tables run into weird issues with pointers.
  • A big data.table allows one to use by= to iterative over subtables, which can be a lot more efficient.
Sign up to request clarification or add additional context in comments.

1 Comment

Clear enough! Thank you @Frank
1

This worked: Using dplyr to append the new column as opposed to above:

setNames(lapply(2011:2015, function(i){
    as.data.table(list(rnorm(10,1,1), rnorm(10,1,1))) %>% 
        mutate(year = i)
}), sprintf("A_%s", 2011:2015))

Edit for seed handling :

setNames(lapply(2011:2015, function(i){
    set.seed(123)
    as.data.table(list(rnorm(10,1,1), rnorm(10,1,1))) %>% 
        mutate(year = i)
}), sprintf("A_%s", 2011:2015))

Output:

List of 5
 $ A_2011:'data.frame': 10 obs. of  3 variables:
  ..$ V1  : num [1:10] 0.44 0.77 2.56 1.07 1.13 ...
  ..$ V2  : num [1:10] 2.224 1.36 1.401 1.111 0.444 ...
  ..$ year: int [1:10] 2011 2011 2011 2011 2011 2011 2011 2011 2011 2011
 $ A_2012:'data.frame': 10 obs. of  3 variables:
  ..$ V1  : num [1:10] 0.44 0.77 2.56 1.07 1.13 ...
  ..$ V2  : num [1:10] 2.224 1.36 1.401 1.111 0.444 ...
  ..$ year: int [1:10] 2012 2012 2012 2012 2012 2012 2012 2012 2012 2012
 $ A_2013:'data.frame': 10 obs. of  3 variables:
  ..$ V1  : num [1:10] 0.44 0.77 2.56 1.07 1.13 ...
  ..$ V2  : num [1:10] 2.224 1.36 1.401 1.111 0.444 ...
  ..$ year: int [1:10] 2013 2013 2013 2013 2013 2013 2013 2013 2013 2013
 $ A_2014:'data.frame': 10 obs. of  3 variables:
  ..$ V1  : num [1:10] 0.44 0.77 2.56 1.07 1.13 ...
  ..$ V2  : num [1:10] 2.224 1.36 1.401 1.111 0.444 ...
  ..$ year: int [1:10] 2014 2014 2014 2014 2014 2014 2014 2014 2014 2014
 $ A_2015:'data.frame': 10 obs. of  3 variables:
  ..$ V1  : num [1:10] 0.44 0.77 2.56 1.07 1.13 ...
  ..$ V2  : num [1:10] 2.224 1.36 1.401 1.111 0.444 ...
  ..$ year: int [1:10] 2015 2015 2015 2015 2015 2015 2015 2015 2015 2015

2 Comments

Thank you! Using dplyr is definitely another good way to go. But do you know why my code does not work as expected?
Sadly I'm not as well versed as I would like to be with data.table to explain more clearly

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.