0

Here is the story:

I have a data set with many binary variables (1 = yes and 0 = no). The thing is that many of those variables are the same, but just written differently. Example:

  • twins
    • gemelli
    • DCDA
    • MCDA
    • twin DCDA
    • twin MCDA
    • ...

It all depends on the doctor/his habits/his mood/his literacy. If an observation gets a "1" for any of those above variables, it means that there will be twins (pregnancy). Now to make some predictions and stuff about twins, I need to group all those observations that have a "1" in any of those possible variables (sometimes even in 2 of them).

Here is what I tried:

features <- mutate(features,
                   TWIN_P = ifelse("twins" == 1 |
                                      "gemelli" == 1 |
                                      "DCDA" == 1 |
                                      "MCDA" == 1 |
                                      "twin DCDA" == 1 |
                                      "twin MCDA" == 1 , 
                                      "1", "0"))

But when I look at the new variable TWIN_P I get 0 twins... Which is of course impossible.

Can someone tell me what I'm doing wrong? The binary variables are numerical. I tried to do this in between "" or without them. But nothing really worked.

Thanks in advance!

1
  • 1
    Don't quote the variable names. You can use backticks `` for those with spaces. Commented Mar 3, 2020 at 15:21

2 Answers 2

2

I'm not sure exactly what your dataframe looks like, so here's a stand-in:

twins <- rbinom(n=10, size=1, prob=0.2) 
gemelli <- rbinom(n=10, size=1, prob=0.2)
DCDA <- rbinom(n=10, size=1, prob=0.2)
MCDA <- rbinom(n=10, size=1, prob=0.2)
twin_DCDA <- rbinom(n=10, size=1, prob=0.2)
twin_MCDA <- rbinom(n=10, size=1, prob=0.2)

df1 <- data.frame(twins, gemelli, DCDA, MCDA, twin_DCDA, twin_MCDA)

Then rowSums on those greater than 0, which will output TRUE or FALSE. Put that into as.integer which will convert TRUE/FALSE into 0 or 1:

df1 %>% 
  mutate(
    TWIN_P = as.integer(rowSums(.)>0)
  )
Sign up to request clarification or add additional context in comments.

Comments

1

As mentioned by @Edward - don't quote the variable names. Since you have two variable names with a space, you can use backticks: `twin DCDA` Generally, you should try to avoid spaces in column names.

In addition, here is another approach with base R to checking if any column value is 1:

set.seed(123)

df <- data.frame(matrix(rbinom(36, 1, .1), ncol = 6))
colnames(df) = c('twins', 'gemelli', 'DCDA', 'MCDA', 'twin DCDA', 'twin MCDA')

cbind(df, TWIN_P = as.numeric(apply(df, 1, function(x) any(x == 1))))

Output

  twins gemelli DCDA MCDA twin DCDA twin MCDA TWIN_P
1     0       0    0    0         0         1      1
2     0       0    0    1         0         1      1
3     0       0    0    0         0         0      0
4     0       0    0    0         0         0      0
5     1       1    0    0         0         0      1
6     0       0    0    1         0         0      1

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.