4

How do I update multiple columns in a data.table with values from a matrix. Here is an MWE illustrating the issue I am facing:

library(data.table)
DT = data.table(expand.grid(1:3,1:3,1:3))
DF = expand.grid(1:3,1:3,1:3)
mat = matrix(seq(0, 80), 27, 3)

In a data.frame world I would go with this syntax:

DF[,2:ncol(DF)] = mat[,2:ncol(DF)] #Data frame approach

A similar take on data.table syntax yields multiple warnings with a very weird output.

DT[,2:ncol(DF) := mat[,2:ncol(DF)], with=FALSE] #Data table approach

This is obviously faulty - as the warnings indicates that the matrix was actually flattened. Warning messages:

1: In `[.data.table`(DT, , `:=`(2:ncol(DF), mat[, 2:ncol(DF)]), with = FALSE) :
  2 column matrix RHS of := will be treated as one vector
2
  • Note that DT[,3:ncol(DT) - 1] is an often made egregious error - you're subtracting 1 from every number in (3:ncol(DT)), not just ncol(DT) itself Commented Aug 12, 2015 at 18:50
  • 1
    I suggest going through the vignettes on the Getting started page. Especially the Reference Semantics vignette. The RHS of := expects/needs a list. Coercion is unavoidable. Commented Aug 12, 2015 at 19:25

2 Answers 2

9

You need to convert the RHS to a list, and an easy way to do that is to use as.data.table:

DT[, 2:ncol(DT) := as.data.table(mat[,2:ncol(DT)])]

with is not necessary here, as LHS is deduced to mean column numbers automatically.

Sign up to request clarification or add additional context in comments.

2 Comments

I'm curious what you mean when you say this kind of assignment isn't optimized. Is assignment using the := syntax normally optimized in a way that is not possible with this particular usage? Keep in mind I'm a bit of a data.table novice.
@bgoldst My post got edited so that line is no longer there, but I mean "optimized" in the non-technical sense of the word. := was designed to be used where you have specific column names being assigned to and specific object names on the right. Something like data[, c("Speed", "Time") := list(dt2$Distance/dt2$Time, dt2$Time)]
3

When assigning to multiple columns, the columns should be collected in a list:

idx <- 2:ncol(DT)
DT[,idx] <- lapply(idx, function(col) mat[,col])

This same syntax works for a data.frame. It's nonstandard in a data.table (where set and := are idiomatic), but still has the benefit of modifying DT by reference, I think.

The idiomatic := approach is:

DT[,(idx) := lapply(idx, function(col) mat[,col])]

3 Comments

So in a way we are falling back on data.frame syntax. Only that the items we are indexing on should be a list?
Or just DT[,(idx) := data.table(mat[,idx])], no?
@sriramn Yeah. Though I'd advocate the := version (that I just added to the answer), since it's idiomatic and therefore easier to read amidst other data.table code. See Arun's comment on your question for a good place to start.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.