modify values of multiple columns in a table

Question

So, here is my sample data:

library(data.table)
mydata <- fread(
"sample,neg1,neg2,neg3,gen1,gen2
sample1,   0,   1,   2,  30, 60
sample2,   1,   0,   1,  15, 30
sample3,   2,   1,   0,  10, 20
")

and in each row I want to subtract background (mean of "neg" columns). My current code is the following:

negatives <- names(mydata)[grep("^neg", names(mydata))] # "neg1" "neg2" "neg3"
mydata[, names(mydata)[-1]:={
  bg <- mean(unlist(.SD[, negatives, with=F]));
  .SD - as.integer(bg);
}, with=F, by=sample]

# mydata
#    sample neg1 neg2 neg3 gen1 gen2
#1: sample1   -1    0    1   29   59
#2: sample2    1    0    1   15   30
#3: sample3    1    0   -1    9   19

it does the job, but works quite slow on my real bigger table - I assume, it's because of using .SD. Is there better way to do this task? using set somehow?

(this question is very similar to my previous one: the source data is in another form here, so I could not find the way to apply the same solution with set, hope it will not be considered a duplicate).

I came up with a two step solution. You can check if it is faster wrt to your solution mydata1 <- mydata[ , V1:=list(as.integer(rowMeans(.SD))), .SDcols=indx]; mydata1[, names(mydata1)[-c(1,7)]:= .SD-mydata1[['V1']], .SDcols=2:6][,V1:=NULL][] — akrun
– akrun, Commented Feb 20, 2015 at 5:10
Another option would to get the rowMeans on the selected columns separately and then use set to update all the columns. I updated the solution — akrun
– akrun, Commented Feb 20, 2015 at 5:18
thanks for pointing, I modified it and moved to the end (it would look strange as a 4th comment here; in addition, I think that previous question I am linking to may also be useful for someone). — Vasily A
– Vasily A, Commented Feb 20, 2015 at 19:09
How about melting the whole thing (I mean converting your data frame into a 'long' format using melt from reshape2 or gather from tidyr), after which the problem becomes trivial? — Marat Talipov
– Marat Talipov, Commented Feb 20, 2015 at 19:13

MichaelChirico · Accepted Answer · 2020-04-18 02:24:18Z

1

You could get the rowMeans of the "neg", columns ("val"), then update all the columns (subtracting from "val") of the dataset except the 1st using set.

 indx <- grep('^neg', names(mydata))
 val <- as.integer(rowMeans(mydata[, ..indx]))
 for(j in 2:ncol(mydata)){
  set(mydata, i=NULL, j=j, value=mydata[[j]]-val)
 }

 mydata
 #    sample neg1 neg2 neg3 gen1 gen2
 #1: sample1   -1    0    1   29   59
 #2: sample2    1    0    1   15   30
 #3: sample3    1    0   -1    9   19

edited Apr 18, 2020 at 2:24

MichaelChirico

34.9k17 gold badges122 silver badges209 bronze badges

answered Feb 20, 2015 at 4:35

akrun

891k38 gold badges590 silver badges700 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

modify values of multiple columns in a table

1 Answer 1

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related