1

I have a function that given an input vector returns a data.frame or data.table; the number of columns and the names of the columns depend on the input. I want to add these columns to an existing data.table using one of the columns of the data.table as input for the function. What is the easiest/cleanest way of doing this in a data.table?

# Example function; in this case the number of columns the function
# returns is fixed, but in practice the number of columns and the
# names of the columns depend on x
my_function <- function(x) {
  name <- deparse1(substitute(x))
  res <- data.table(x == 1, x == 2)
  names(res) <- paste0(name, "==", 1:2)
  res
}

# Example data set
dta <- data.table(a = sample(1:10, 10, replace = TRUE), b = letters[1:10])

I can create new columns using this function:

> dta[, my_function(a)]
     a==1  a==2
 1: FALSE FALSE
 2: FALSE FALSE
 3: FALSE FALSE
 4: FALSE FALSE
 5: FALSE FALSE
 6:  TRUE FALSE
 7: FALSE FALSE
 8:  TRUE FALSE
 9: FALSE  TRUE
10:  TRUE FALSE

However, I also want to keep existing columns. The following does what I want, but I expect there is a simpler/better solution. I also expect that the cbind will introduce a copy of the data which is another reason I want to avoid this as the data sets are quite large.

> dta <- cbind(dta, dta[, my_function(a)])
> dta
     a b  a==1  a==2
 1:  1 a  TRUE FALSE
 2:  8 b FALSE FALSE
 3:  2 c FALSE  TRUE
 4:  4 d FALSE FALSE
 5: 10 e FALSE FALSE
 6:  4 f FALSE FALSE
 7:  8 g FALSE FALSE
 8: 10 h FALSE FALSE
 9:  8 i FALSE FALSE
10:  4 j FALSE FALSE
7
  • I doubt if there would be anything simple/better/shorter than cbind. You could do dta[, (LETTERS[2:3]) := my_function(a)] if you know number of columns that would be returned beforehand but in your case unfortunately you don't. Commented Aug 18, 2020 at 12:47
  • If you just want to keep input column, you may adjust my_function: my_function <- function(x) { name <- deparse1(substitute(x)) res <- data.table(x = x, x == 1, x == 2) names(res)[2:3] <- paste0(name, "==", 1:2) names(res)[1] <- paste0(name) res } Commented Aug 18, 2020 at 12:52
  • @janderkran In my example the original dataset only has one column a, but in practice this is one column of a large set of columns (I will change the example). Commented Aug 18, 2020 at 12:56
  • @RonakShah The data sets I will apply this to are quite large. I would have to check, but I am afraid the dta <- cbind(dta, dta[...]) introduces a copy of my data. I know the syntax of the example you show, but in that case, as you mention, you have to know the number of columns, and also fix the names of columns. Commented Aug 18, 2020 at 12:58
  • This is 3 step approach : 1. tmp <- dta[,my_function(a)] 2. cols <- paste0('cols', seq_along(tmp)) 3. dta[, (cols) := tmp] Not sure if it qualifies as an answer. Commented Aug 18, 2020 at 13:01

1 Answer 1

1

Here is one way which avoids copying the original data.table object :

library(data.table)
#Create a temporary object
tmp <- dta[,my_function(a)] 
#Create column names
cols <- paste0('cols', seq_along(tmp)) 
#Add the temporary object with new column names
dta[, (cols) := tmp]

Benchmark added by OP

Below the function I used to benchmark the solutions:

library(data.table)
my_function <- function(x) {
  name <- deparse1(substitute(x))
  res <- data.table(x == 1, x == 2)
  names(res) <- paste0(name, "==", 1:2)
  res
}
set.seed(1)
N <- 2E7
x <- sample(1:10, N, replace = TRUE)
dta <- data.table()
dta[, (letters[1:24]) := x]

t <- system.time({
  tmp <- dta[, my_function(a)]
  cols <- names(tmp)
  dta[, (cols) := tmp]
})
#t <- system.time({
#  dta <- cbind(dta, dta[, my_function(a)])
#})
print(t)

The command was run under Linux (Ubuntu 20.04) using /bin/time -v Rscript bench.R. time reports max memory use in the field Maximum resident set size (kbytes).

For the cbind solution the reported user time was 1.362 seconds and max memory 4206072 kbytes.

For the solution above the reported user time was 0.339 seconds and max memory 2486996 kbytes.

The solution above is threfore faster and uses less memory than the cbind version.

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks. I will wait for possible better answers; for now I will use this one.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.