Combine several multiple binary variables into 1 in R

Question

Here is the story:

I have a data set with many binary variables (1 = yes and 0 = no). The thing is that many of those variables are the same, but just written differently. Example:

twins
- gemelli
- DCDA
- MCDA
- twin DCDA
- twin MCDA
- ...

It all depends on the doctor/his habits/his mood/his literacy. If an observation gets a "1" for any of those above variables, it means that there will be twins (pregnancy). Now to make some predictions and stuff about twins, I need to group all those observations that have a "1" in any of those possible variables (sometimes even in 2 of them).

Here is what I tried:

features <- mutate(features,
                   TWIN_P = ifelse("twins" == 1 |
                                      "gemelli" == 1 |
                                      "DCDA" == 1 |
                                      "MCDA" == 1 |
                                      "twin DCDA" == 1 |
                                      "twin MCDA" == 1 , 
                                      "1", "0"))

But when I look at the new variable TWIN_P I get 0 twins... Which is of course impossible.

Can someone tell me what I'm doing wrong? The binary variables are numerical. I tried to do this in between "" or without them. But nothing really worked.

Thanks in advance!

Don't quote the variable names. You can use backticks `` for those with spaces. — Edward
– Edward, Commented Mar 3, 2020 at 15:21

userABC123 · Accepted Answer · 2020-03-03 15:05:24Z

2

I'm not sure exactly what your dataframe looks like, so here's a stand-in:

twins <- rbinom(n=10, size=1, prob=0.2) 
gemelli <- rbinom(n=10, size=1, prob=0.2)
DCDA <- rbinom(n=10, size=1, prob=0.2)
MCDA <- rbinom(n=10, size=1, prob=0.2)
twin_DCDA <- rbinom(n=10, size=1, prob=0.2)
twin_MCDA <- rbinom(n=10, size=1, prob=0.2)

df1 <- data.frame(twins, gemelli, DCDA, MCDA, twin_DCDA, twin_MCDA)

Then rowSums on those greater than 0, which will output TRUE or FALSE. Put that into as.integer which will convert TRUE/FALSE into 0 or 1:

df1 %>% 
  mutate(
    TWIN_P = as.integer(rowSums(.)>0)
  )

answered Mar 3, 2020 at 15:05

userABC123

1,5222 gold badges18 silver badges32 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Ben · Accepted Answer · 2020-03-03 16:58:40Z

As mentioned by @Edward - don't quote the variable names. Since you have two variable names with a space, you can use backticks: `twin DCDA` Generally, you should try to avoid spaces in column names.

In addition, here is another approach with base R to checking if any column value is 1:

set.seed(123)

df <- data.frame(matrix(rbinom(36, 1, .1), ncol = 6))
colnames(df) = c('twins', 'gemelli', 'DCDA', 'MCDA', 'twin DCDA', 'twin MCDA')

cbind(df, TWIN_P = as.numeric(apply(df, 1, function(x) any(x == 1))))

Output

  twins gemelli DCDA MCDA twin DCDA twin MCDA TWIN_P
1     0       0    0    0         0         1      1
2     0       0    0    1         0         1      1
3     0       0    0    0         0         0      0
4     0       0    0    0         0         0      0
5     1       1    0    0         0         0      1
6     0       0    0    1         0         0      1

Collectives™ on Stack Overflow

Combine several multiple binary variables into 1 in R

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related