I have two data.tables, X (3m rows by ~500 columns), and Y (100 rows by two columns).
set.seed(1)
X <- data.table( a=letters, b=letters, c=letters, g=sample(c(1:5,7),length(letters),replace=TRUE), key="g" )
Y <- data.table( z=runif(6), g=1:6, key="g" )
I want to do a left outer join on X, which I can do by Y[X] thanks to:
Why does X[Y] join of data.tables not allow a full outer join, or a left join?
But I want to add the new column to X without copying X (since it's huge).
Obviously, something like X <- Y[X] works, but unless data.table is far cleverer than I give it credit for (and I give it credit for quite a lot of deviousness!), I believe this copies the whole of X.
X[ , z:= Y[X,z]$z ] works, but is kludgy and doesn't scale well to more than one column.
How do I store the results of a merge back into the retained data.table in an efficient (both in terms of copies and in terms of programmer time) way?
Y[X,z](and will possibly run into problems doing that if you forget about by-without-by), justX[, z := Y[X]$z]works and seems to be faster for this example; although ultimatelyX = Y[X]is by far the fastest of the different expressions I've tried so far,zin there because I thought that would give DT info about what variables it needed to retain since it optimizes on that. But your (deleted) point is worth copying here: "watch out for hidden by-without-by when doing smth likeY[X,z]." Even if it's fast, ifX = Y[X]creates a copy I'm potentially in trouble....Y[X,list(z)]instead?