1

I have 5 variables, var1, var2 etc which are all coded as such:

Factor w/ 2 levels "no","yes": 2 1 1 2 1 2 1 1 1 1 ...

I would like to combine them into one. So far I have only used:

comb_drug <- with(dt1,interaction(var1, var2, var2, var4, var5))

which gives a variable with 32 levels. I would now like to create a variable with the following 3 levels:

  • all 5 are yes
  • any 4 are yes
  • less than 4 are yes

What is the best way to do this ? Here is some example data:

var1 <- as.factor(c(2,2,1,2,2,1,2,1,2,2))
var2 <- as.factor(c(2,1,2,2,2,1,2,2,2,2))
var3 <- as.factor(c(2,2,1,2,2,2,2,2,1,2))
var4 <- as.factor(c(2,2,1,2,2,2,2,2,1,2))
var5 <- as.factor(c(2,2,2,1,2,1,2,1,1,2))

dt <- data.frame(var1,var2,var3,var4,var5)

for ( i in 1:5) {
    levels(dt[,i]) <- c("no","yes")
}

   var1 var2 var3 var4 var5
1   yes  yes  yes  yes  yes
2   yes   no  yes  yes  yes
3    no  yes   no   no  yes
4   yes  yes  yes  yes   no
5   yes  yes  yes  yes  yes
6    no   no  yes  yes   no
7   yes  yes  yes  yes  yes
8    no   no  yes  yes   no
9   yes  yes   no   no   no
10  yes  yes  yes  yes  yes

I would instead like

    newvar
1   allyes
2   4yes
3   lessthan4yes
4   4yes
5   allyes
6   lessthan4yes
7   allyes
8   lessthan4yes
9   lessthan4yes
10  allyes

3 Answers 3

3

An alternative that might be slightly faster than apply(x,1,sum) (rowSums)

dt$nYes <- rep(c('<4','4','all'),times = c(3,1,1))[rowSums(dt=='yes')]
Sign up to request clarification or add additional context in comments.

Comments

2

This should get you on your way... Just add up the number of "yes" values per row:

dt$newvar <- apply(dt, 1, function(x) sum(x == "yes"))
dt$newvar
#  [1] 5 4 2 4 5 2 5 3 2 5

From there, you can do some clever factoring to get what you need... or this might be good enough for your purposes.

Actually, rowSums would be a lot faster probably:

dt$newvar <- rowSums(dt == "yes")

Comments

1

If you subtract 1 from all your data, you'll have zeroes and ones, which is directly interpretable as TRUE/FALSE, which makes software jocks happier :-) . As an added bonus, for some vector of T/F (or 1 and 0), sum(myvector) gives you the number of TRUE directly. At that point, you could even have a look-up matrix like

sum  label
0    allno
1     one_no
2    lessthan4yes
3    lessthan4yes
4    4yes
5    yes

and do a direct replacement as newvec <- lutmat[lutmat[,1]==sums,2] .

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.