3

I have a data.table in R and want to create a new column. Let's say that I have the date column name saved as a variable and want to append _year to that name in the new column. I'm able to do that the normal route by just specifying the name, but how can I create the new column name using the date_col variable.

Here is what I've tried. The last two, which I want, don't work.

dat = data.table(one = 1:5, two = 1:5, 
                 order_date = lubridate::ymd("2015-01-01","2015-02-01","2015-03-01",
                           "2015-04-01","2015-05-01"))
dat
date_col = "order_date"
dat[,`:=`(OrderDate_year = substr(get(date_col)[!is.na(get(date_col))],1,4))][]
dat[,`:=`(new = substr(noquote(get(date_col))[!is.na(noquote(get(date_col)))],1,4))][]
dat[,`:=`(paste0(date_col, "_year", sep="") = substr(noquote(get(date_col))[!is.na(noquote(get(date_col)))],1,4))][]
dat[,`:=`(noquote(paste0(date_col, "_year", sep="")) = substr(noquote(get(date_col))[!is.na(noquote(get(date_col)))],1,4))][]

2 Answers 2

3

The last two statements return an error message:

dat[,`:=`(paste0(date_col, "_year", sep="") = substr(noquote(get(date_col))[!is.na(noquote(get(date_col)))],1,4))][]
Error: unexpected '=' in "dat[,`:=`(paste0(date_col, "_year", sep="") ="
dat[,`:=`(noquote(paste0(date_col, "_year", sep="")) = substr(noquote(get(date_col))[!is.na(noquote(get(date_col)))],1,4))][]
Error: unexpected '=' in "dat[,`:=`(noquote(paste0(date_col, "_year", sep="")) ="

The correct syntax for calling the :=() function is:

dat[, `:=`(paste0(date_col, "_year", sep = ""), 
           substr(noquote(get(date_col))[!is.na(noquote(get(date_col)))], 1, 4))][]
dat[, `:=`(noquote(paste0(date_col, "_year", sep = "")), 
           substr(noquote(get(date_col))[!is.na(noquote(get(date_col)))], 1, 4))][]

i.e., replace = by ,.


However, assignment syntax and right hand side are far too complicated.

The order_date column is already of class Date:

str(dat)
Classes ‘data.table’ and 'data.frame':    5 obs. of  3 variables:
 $ one       : int  1 2 3 4 5
 $ two       : int  1 2 3 4 5
 $ order_date: Date, format: "2015-01-01" "2015-02-01" ...
 - attr(*, ".internal.selfref")=<externalptr>

In order to extract the year, year() function can be used (either from the data.table package or the lubridate package whatever is loaded last), so no conversion back to character and extraction of the year string is required:

date_col = "order_date"
dat[, paste0(date_col, "_year") := lapply(.SD, year), .SDcols = date_col][]
   one two order_date order_date_year
1:   1   1 2015-01-01            2015
2:   2   2 2015-02-01            2015
3:   3   3 2015-03-01            2015
4:   4   4 2015-04-01            2015
5:   5   5 2015-05-01            2015

Alternatively,

dat[, paste0(date_col, "_year") := year(get(date_col))][]
dat[, `:=`(paste0(date_col, "_year"), year(get(date_col)))][]

work as well.

Sign up to request clarification or add additional context in comments.

Comments

1

The set function is nice for doing this. Quicker than setting inside the data.table too. Is this what you are after? http://brooksandrew.github.io/simpleblog/articles/advanced-data-table/#fast-looping-with-set

library(data.table)
dat = data.table(one = 1:5, two = 1:5, 
                 order_date = lubridate::ymd("2015-01-01","2015-02-01","2015-03-01",
                           "2015-04-01","2015-05-01"))
dat
date_col = "order_date"

year_col <- paste0(date_col, "_year", sep="")
set(dat, j = year_col, value = substr(dat[[date_col]], 1, 4) )

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.