1

So, here is my sample data:

library(data.table)
mydata <- fread(
"sample,neg1,neg2,neg3,gen1,gen2
sample1,   0,   1,   2,  30, 60
sample2,   1,   0,   1,  15, 30
sample3,   2,   1,   0,  10, 20
")

and in each row I want to subtract background (mean of "neg" columns). My current code is the following:

negatives <- names(mydata)[grep("^neg", names(mydata))] # "neg1" "neg2" "neg3"
mydata[, names(mydata)[-1]:={
  bg <- mean(unlist(.SD[, negatives, with=F]));
  .SD - as.integer(bg);
}, with=F, by=sample]

# mydata
#    sample neg1 neg2 neg3 gen1 gen2
#1: sample1   -1    0    1   29   59
#2: sample2    1    0    1   15   30
#3: sample3    1    0   -1    9   19

it does the job, but works quite slow on my real bigger table - I assume, it's because of using .SD. Is there better way to do this task? using set somehow?

(this question is very similar to my previous one: the source data is in another form here, so I could not find the way to apply the same solution with set, hope it will not be considered a duplicate).

8
  • Oops, sorry about that. Commented Feb 20, 2015 at 5:06
  • I came up with a two step solution. You can check if it is faster wrt to your solution mydata1 <- mydata[ , V1:=list(as.integer(rowMeans(.SD))), .SDcols=indx]; mydata1[, names(mydata1)[-c(1,7)]:= .SD-mydata1[['V1']], .SDcols=2:6][,V1:=NULL][] Commented Feb 20, 2015 at 5:10
  • Another option would to get the rowMeans on the selected columns separately and then use set to update all the columns. I updated the solution Commented Feb 20, 2015 at 5:18
  • 1
    thanks for pointing, I modified it and moved to the end (it would look strange as a 4th comment here; in addition, I think that previous question I am linking to may also be useful for someone). Commented Feb 20, 2015 at 19:09
  • 1
    How about melting the whole thing (I mean converting your data frame into a 'long' format using melt from reshape2 or gather from tidyr), after which the problem becomes trivial? Commented Feb 20, 2015 at 19:13

1 Answer 1

1

You could get the rowMeans of the "neg", columns ("val"), then update all the columns (subtracting from "val") of the dataset except the 1st using set.

 indx <- grep('^neg', names(mydata))
 val <- as.integer(rowMeans(mydata[, ..indx]))
 for(j in 2:ncol(mydata)){
  set(mydata, i=NULL, j=j, value=mydata[[j]]-val)
 }

 mydata
 #    sample neg1 neg2 neg3 gen1 gen2
 #1: sample1   -1    0    1   29   59
 #2: sample2    1    0    1   15   30
 #3: sample3    1    0   -1    9   19
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.