Assign multiple columns using := in data.table, by group

Question

What is the best way to assign to multiple columns using data.table? For example:

f <- function(x) {c("hi", "hello")}
x <- data.table(id = 1:10)

I would like to do something like this (of course this syntax is incorrect):

x[ , (col1, col2) := f(), by = "id"]

And to extend that, I may have many columns with names stored in a variable (say col_names) and I would like to do:

x[ , col_names := another_f(), by = "id", with = FALSE]

What is the correct way to do something like this?

This looks like it has been answered: stackoverflow.com/questions/11308754/… — Alex
– Alex, Commented Jul 27, 2012 at 20:52
Alex, That answer is close but it doesn't seem to work in combination with by as @Christoph_J is correct to say. Link to your question added to FR#2120 "Drop needing with=FALSE for LHS of :=", so it won't get forgotten to revisit. — Matt Dowle
– Matt Dowle, Commented Aug 8, 2012 at 15:29
To be clear, f() is a function returning multiple values, one for each of your columns. — smci
– smci, Commented May 4, 2018 at 6:10

MichaelChirico · Accepted Answer · 2019-09-27 07:35:06Z

196

This now works in v1.8.3 on R-Forge. Thanks for highlighting it!

x <- data.table(a = 1:3, b = 1:6) 
f <- function(x) {list("hi", "hello")} 
x[ , c("col1", "col2") := f(), by = a][]
#    a b col1  col2
# 1: 1 1   hi hello
# 2: 2 2   hi hello
# 3: 3 3   hi hello
# 4: 1 4   hi hello
# 5: 2 5   hi hello
# 6: 3 6   hi hello

x[ , c("mean", "sum") := list(mean(b), sum(b)), by = a][]
#    a b col1  col2 mean sum
# 1: 1 1   hi hello  2.5   5
# 2: 2 2   hi hello  3.5   7
# 3: 3 3   hi hello  4.5   9
# 4: 1 4   hi hello  2.5   5
# 5: 2 5   hi hello  3.5   7
# 6: 3 6   hi hello  4.5   9 

mynames = c("Name1", "Longer%")
x[ , (mynames) := list(mean(b) * 4, sum(b) * 3), by = a]
#     a b col1  col2 mean sum Name1 Longer%
# 1: 1 1   hi hello  2.5   5    10      15
# 2: 2 2   hi hello  3.5   7    14      21
# 3: 3 3   hi hello  4.5   9    18      27
# 4: 1 4   hi hello  2.5   5    10      15
# 5: 2 5   hi hello  3.5   7    14      21
# 6: 3 6   hi hello  4.5   9    18      27

x[ , get("mynames") := list(mean(b) * 4, sum(b) * 3), by = a][]  # same
#    a b col1  col2 mean sum Name1 Longer%
# 1: 1 1   hi hello  2.5   5    10      15
# 2: 2 2   hi hello  3.5   7    14      21
# 3: 3 3   hi hello  4.5   9    18      27
# 4: 1 4   hi hello  2.5   5    10      15
# 5: 2 5   hi hello  3.5   7    14      21
# 6: 3 6   hi hello  4.5   9    18      27

x[ , eval(mynames) := list(mean(b) * 4, sum(b) * 3), by = a][]   # same
#    a b col1  col2 mean sum Name1 Longer%
# 1: 1 1   hi hello  2.5   5    10      15
# 2: 2 2   hi hello  3.5   7    14      21
# 3: 3 3   hi hello  4.5   9    18      27
# 4: 1 4   hi hello  2.5   5    10      15
# 5: 2 5   hi hello  3.5   7    14      21
# 6: 3 6   hi hello  4.5   9    18      27

Older version using the with argument (we discourage this argument when possible):

x[ , mynames := list(mean(b) * 4, sum(b) * 3), by = a, with = FALSE][] # same
#    a b col1  col2 mean sum Name1 Longer%
# 1: 1 1   hi hello  2.5   5    10      15
# 2: 2 2   hi hello  3.5   7    14      21
# 3: 3 3   hi hello  4.5   9    18      27
# 4: 1 4   hi hello  2.5   5    10      15
# 5: 2 5   hi hello  3.5   7    14      21
# 6: 3 6   hi hello  4.5   9    18      27

edited Sep 27, 2019 at 7:35

MichaelChirico

34.9k17 gold badges122 silver badges209 bronze badges

answered Oct 6, 2012 at 8:48

Matt Dowle

59.7k24 gold badges172 silver badges224 bronze badges

Sign up to request clarification or add additional context in comments.

15 Comments

Matt Dowle Over a year ago

@dnlbrky dim returns a vector so converting that to type list should rotate it; e.g. [,c("rows","cols"):=as.list(dim(get(objectName))),by=objectNa‌me]. Trouble is that as.list has call overhead and also copies the small vector. If efficiency is a problem as number of groups rises then please let us know.

Josh O'Brien Over a year ago

Hi Matt. The first example in your second code block (i.e. x[,mynames:=list(mean(b)*4,sum(b)*3),by=a,with=FALSE][]) now throws a warning, so maybe remove it? On a related note, has anyone suggested that, with options(datatable.WhenJisSymbolThenCallingScope=TRUE), an assignment like x[,mynames:=list(mean(b)*4,sum(b)*3),by=a] should in fact work? Seems like that would be consistent with the other changes, though I guess it might break too much existing user code (?).

Matt Dowle Over a year ago

@PanFrancisco Without by=a it will work, but return a different answer. The mean(a) and sum(a) aggregates are being recycled within each group when by=a. Without by=a it just sticks the mean and sum for the entire column into each cell (i.e. different numbers).

Feng Jiang Over a year ago

@MattDowle what if my function already returns named list, is there anyway I can add the columns to the dt without having to name them again? e.g. f <- function(x) {list("c"="hi", "d"="hello")} will print results with named cols with x[ , f(), by = a][] . I don't know how to append the result to the dt.

Matt Dowle Over a year ago

@Jfly That would be a good new question which would likely lead to a feature request filed on GitHub. Perhaps something like x[, {ans=f(); names(ans):=ans}, by=a] could be implemented. That syntax conveys the intent quite nicely to my eye. What you think?

|

Gerry · Accepted Answer · 2018-04-01 11:37:02Z

86

The following shorthand notation might be useful. All credit goes to Andrew Brooks, specifically this article.

dt[,`:=`(avg=mean(mpg), med=median(mpg), min=min(mpg)), by=cyl]

answered Apr 1, 2018 at 11:37

Gerry

2,1804 gold badges22 silver badges27 bronze badges

1 Comment

caiohamamura Over a year ago

This is so much better and more readable than the c() := list()

Collectives™ on Stack Overflow

Assign multiple columns using := in data.table, by group

2 Answers 2

15 Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

15 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related