175

What is the best way to assign to multiple columns using data.table? For example:

f <- function(x) {c("hi", "hello")}
x <- data.table(id = 1:10)

I would like to do something like this (of course this syntax is incorrect):

x[ , (col1, col2) := f(), by = "id"]

And to extend that, I may have many columns with names stored in a variable (say col_names) and I would like to do:

x[ , col_names := another_f(), by = "id", with = FALSE]

What is the correct way to do something like this?

3
  • 1
    This looks like it has been answered: stackoverflow.com/questions/11308754/… Commented Jul 27, 2012 at 20:52
  • Alex, That answer is close but it doesn't seem to work in combination with by as @Christoph_J is correct to say. Link to your question added to FR#2120 "Drop needing with=FALSE for LHS of :=", so it won't get forgotten to revisit. Commented Aug 8, 2012 at 15:29
  • To be clear, f() is a function returning multiple values, one for each of your columns. Commented May 4, 2018 at 6:10

2 Answers 2

196

This now works in v1.8.3 on R-Forge. Thanks for highlighting it!

x <- data.table(a = 1:3, b = 1:6) 
f <- function(x) {list("hi", "hello")} 
x[ , c("col1", "col2") := f(), by = a][]
#    a b col1  col2
# 1: 1 1   hi hello
# 2: 2 2   hi hello
# 3: 3 3   hi hello
# 4: 1 4   hi hello
# 5: 2 5   hi hello
# 6: 3 6   hi hello

x[ , c("mean", "sum") := list(mean(b), sum(b)), by = a][]
#    a b col1  col2 mean sum
# 1: 1 1   hi hello  2.5   5
# 2: 2 2   hi hello  3.5   7
# 3: 3 3   hi hello  4.5   9
# 4: 1 4   hi hello  2.5   5
# 5: 2 5   hi hello  3.5   7
# 6: 3 6   hi hello  4.5   9 

mynames = c("Name1", "Longer%")
x[ , (mynames) := list(mean(b) * 4, sum(b) * 3), by = a]
#     a b col1  col2 mean sum Name1 Longer%
# 1: 1 1   hi hello  2.5   5    10      15
# 2: 2 2   hi hello  3.5   7    14      21
# 3: 3 3   hi hello  4.5   9    18      27
# 4: 1 4   hi hello  2.5   5    10      15
# 5: 2 5   hi hello  3.5   7    14      21
# 6: 3 6   hi hello  4.5   9    18      27


x[ , get("mynames") := list(mean(b) * 4, sum(b) * 3), by = a][]  # same
#    a b col1  col2 mean sum Name1 Longer%
# 1: 1 1   hi hello  2.5   5    10      15
# 2: 2 2   hi hello  3.5   7    14      21
# 3: 3 3   hi hello  4.5   9    18      27
# 4: 1 4   hi hello  2.5   5    10      15
# 5: 2 5   hi hello  3.5   7    14      21
# 6: 3 6   hi hello  4.5   9    18      27

x[ , eval(mynames) := list(mean(b) * 4, sum(b) * 3), by = a][]   # same
#    a b col1  col2 mean sum Name1 Longer%
# 1: 1 1   hi hello  2.5   5    10      15
# 2: 2 2   hi hello  3.5   7    14      21
# 3: 3 3   hi hello  4.5   9    18      27
# 4: 1 4   hi hello  2.5   5    10      15
# 5: 2 5   hi hello  3.5   7    14      21
# 6: 3 6   hi hello  4.5   9    18      27

Older version using the with argument (we discourage this argument when possible):

x[ , mynames := list(mean(b) * 4, sum(b) * 3), by = a, with = FALSE][] # same
#    a b col1  col2 mean sum Name1 Longer%
# 1: 1 1   hi hello  2.5   5    10      15
# 2: 2 2   hi hello  3.5   7    14      21
# 3: 3 3   hi hello  4.5   9    18      27
# 4: 1 4   hi hello  2.5   5    10      15
# 5: 2 5   hi hello  3.5   7    14      21
# 6: 3 6   hi hello  4.5   9    18      27
Sign up to request clarification or add additional context in comments.

15 Comments

@dnlbrky dim returns a vector so converting that to type list should rotate it; e.g. [,c("rows","cols"):=as.list(dim(get(objectName))),by=objectNa‌​me]. Trouble is that as.list has call overhead and also copies the small vector. If efficiency is a problem as number of groups rises then please let us know.
Hi Matt. The first example in your second code block (i.e. x[,mynames:=list(mean(b)*4,sum(b)*3),by=a,with=FALSE][]) now throws a warning, so maybe remove it? On a related note, has anyone suggested that, with options(datatable.WhenJisSymbolThenCallingScope=TRUE), an assignment like x[,mynames:=list(mean(b)*4,sum(b)*3),by=a] should in fact work? Seems like that would be consistent with the other changes, though I guess it might break too much existing user code (?).
@PanFrancisco Without by=a it will work, but return a different answer. The mean(a) and sum(a) aggregates are being recycled within each group when by=a. Without by=a it just sticks the mean and sum for the entire column into each cell (i.e. different numbers).
@MattDowle what if my function already returns named list, is there anyway I can add the columns to the dt without having to name them again? e.g. f <- function(x) {list("c"="hi", "d"="hello")} will print results with named cols with x[ , f(), by = a][] . I don't know how to append the result to the dt.
@Jfly That would be a good new question which would likely lead to a feature request filed on GitHub. Perhaps something like x[, {ans=f(); names(ans):=ans}, by=a] could be implemented. That syntax conveys the intent quite nicely to my eye. What you think?
|
86

The following shorthand notation might be useful. All credit goes to Andrew Brooks, specifically this article.

dt[,`:=`(avg=mean(mpg), med=median(mpg), min=min(mpg)), by=cyl]

1 Comment

This is so much better and more readable than the c() := list()

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.