48

I'm trying to manipulate a number of data.tables in similar ways, and would like to write a function to accomplish this. I would like to pass in a parameter containing a list of columns that would have the operations performed. This works fine when the vector declaration of columns is the left hand side of the := operator, but not if it is declared earlier (or passed into the function). The follow code shows the issue.

dt = data.table(a = letters, b = 1:2, c=1:13)
colsToDelete = c('b', 'c')
dt[,colsToDelete := NULL] # doesn't work but I don't understand why not.
dt[,c('b', 'c') := NULL] # works fine, but doesn't allow passing in of columns

The error is "Adding new column 'colsToDelete' then assigning NULL (deleting it)." So clearly, it's interpreting 'colsToDelete' as a new column name.

The same issue occurs when doing something along these lines

dt[, colNames := lapply(.SD, adjustValue, y=factor), .SDcols = colNames]

I new to R, but rather more experienced with some other languages, so this may be a silly question.

0

4 Answers 4

71

It's basically because we allow symbols on LHS of := to add new columns, for convenience: ex: DT[, col := val]. So, in order to distinguish col itself being the name from whatever is stored in col being the column names, we check if the LHS is a name or an expression.

If it's a name, it adds the column with the name as such on the LHS, and if expression, then it gets evaluated.

DT[, col := val] # col is the column name.

DT[, (col) := val]  # col gets evaluated and replaced with its value
DT[, c(col) := val] # same as above

The preferred idiom is: dt[, (colsToDelete) := NULL]

HTH

Sign up to request clarification or add additional context in comments.

1 Comment

I know this is a very old post now and there is a solution, but if removecols is a vector of column names you'd like to remove, you can do this: DT<-DT[,.SD,.SDcols=-removecols]
24

I am surprised no answer provided uses the set() function.

set(DT, , colsToDelete, NULL)

This should be the easiest.

2 Comments

two commas in a row?
@wolfsatthedoor The empty argument is the i argument, which refers to rows. As it's omitted, it indicates that all rows are to be updated.
15

To extend on previous answer, you can delete columns by reference doing:

# delete columns 10 to 15
dt[ , (10:15) := NULL ]

or

# delete columns 3, 5 and 10 to 15
dt[ , (c(3,5,10:15)) := NULL ]

1 Comment

to add on this, you can also do dt[ , -(10:15) ] or dt[ , -c(3,5,10:15)]
0

This code did the job for me. you need to have the position of the columns to be deleted e.g., posvec as mentioned in the ?set

j: Column name(s) (character) or number(s) (integer) to be assigned value when column(s) already exist, and only column name(s) if they are to be created.

DT_removed_slected_col = set(DT, j = posvec, value = NULL)

Also if you want to get the posvec you can try this:

selected_col = c('col_a','col_b',...)

selected_col = unlist(sapply(selected_col, function(x) grep(x,names(DT)))) 

namvec = names(selected_col) #col names

posvec = unname(selected_col) #col positions

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.