Merging binary variables in R to a new variable

Question

I have three categorical variables i.e. stroke, MI and BP with values of 0 = yes and 1= No. I want to merge them to make a new variable "cvd" out of these three variables where each row with 0 gets 0 values in new cardiovascular variable. For example:

Stroke  MI  BP  CVD
0       1    1   0
1       1    1   1
1       1    0   0

I tried the following code but this is not what i want

transform(koratest, cvd=paste(stroke,MI, BP))

Can someone please help what could be the script for this?

Best,

Thank you for all the solutions. What to do if there is missing values in any of the values to be merged. I want missing values to be labelled as 1 but if there is 0 with missing value, i want cvd variable to have value of 1. For example:

 Stroke  MI  BP  CVD
0       1    1   0
1       NA   NA  1
0       NA   1   0

How could i achieve such output?

Sotos · Accepted Answer · 2022-07-11 11:39:02Z

2

Try,

(rowSums(df) == ncol(df)) * 1
#[1] 0 1 0

answered Jul 11, 2022 at 11:39

Sotos

51.6k6 gold badges35 silver badges69 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Hasan Sohail Over a year ago

this looks good. Thank you. But my dataset has a list of other variables as well. How should i select only these three variable in the script and where should i define the name of new variable

Sotos Over a year ago

If they are the first 3 on your data frame then df[1:3] or the index they are found on your df

Shafee · Accepted Answer · 2022-07-11 11:46:31Z

1

Another way:

library(dplyr)

df <- data.frame(Stroke = c(0,1,1),
                   MI = c(1,1,1),
                   BP = c(1,1,0))

df %>% 
  rowwise() %>% 
  mutate(
    CVD = min(Stroke, MI, BP) 
  ) %>% 
  ungroup()

#> # A tibble: 3 × 4
#>   Stroke    MI    BP   CVD
#>    <dbl> <dbl> <dbl> <dbl>
#> 1      0     1     1     0
#> 2      1     1     1     1
#> 3      1     1     0     0

^{Created on 2022-07-11 by the reprex package (v2.0.1)}

answered Jul 11, 2022 at 11:46

Shafee

20.9k4 gold badges39 silver badges73 bronze badges

Comments

J.Li · Accepted Answer · 2022-07-11 11:53:49Z

1

Don't know how you arrange your variables. If they are separted vectors, this should work:

Stroke = c(0,1,1)
MI = c(1,1,1)
BP = c(1,1,0)
CVD = as.numeric(Stroke & MI & BP)

If a data.frame:

df$CVD = with(df, as.numeric(Stroke & MI & BP)

Or the solutions mentioned by others.

answered Jul 11, 2022 at 11:53

J.Li

616 bronze badges

Comments

Mohamed Desouky · Accepted Answer · 2022-07-11 11:58:05Z

1

Try this using dplyr rowwise function

library(dplyr)

df |> rowwise() |> mutate(CVD = if(all(c_across() == 1)) 1 else 0) |> ungroup()

output

# A tibble: 3 × 4
# Rowwise: 
  Stroke    MI    BP   CVD
   <int> <int> <int> <dbl>
1      0     1     1     0
2      1     1     1     1
3      1     1     0     0

edited Jul 11, 2022 at 11:58

answered Jul 11, 2022 at 11:47

Mohamed Desouky

4,4452 gold badges6 silver badges21 bronze badges

1 Comment

Shafee Over a year ago

You may want to add a ungroup to make it usual tbl_df object.

Orlando Sabogal · Accepted Answer · 2022-07-11 11:42:34Z

0

Maybe this:

library(tidyverse)

Data <- data.frame(Stroke = c(0,1,1),
                   MI = c(1,1,1),
                   BP = c(1,1,0))

Data <- Data %>% 
  mutate(CVD = if_else(Stroke == 1 &MI == 1 & BP == 1, 1, 0))

answered Jul 11, 2022 at 11:42

Orlando Sabogal

1,63612 silver badges23 bronze badges

Comments

Quinten · Accepted Answer · 2022-07-11 11:47:57Z

0

base R option:

df$CVD <- apply(df,2, function(x) !any(0 %in% x)) + 0
df

Output:

  Stroke MI BP CVD
1      0  1  1   0
2      1  1  1   1
3      1  1  0   0

answered Jul 11, 2022 at 11:47

Quinten

42.8k12 gold badges58 silver badges117 bronze badges

1 Comment

Hasan Sohail Over a year ago

another aspect to this question: How i deal if i have any missing values in the variables to be merged. Detail has been edited in the question

jay.sf · Accepted Answer · 2022-07-11 12:27:02Z

0

Using rowSums in cbind detects that dat is a data frame and creates such.

cbind(dat, CVD=+(rowSums(dat[c('Stroke', 'MI', 'BP')]) == 3))
#   Stroke MI BP CVD
# 1      0  1  1   0
# 2      1  1  1   1
# 3      1  1  0   0

If you only have these columns, it simplifies to:

cbind(dat, CVD=+(rowSums(dat) == 3))

Data:

dat <- structure(list(Stroke = c(0L, 1L, 1L), MI = c(1L, 1L, 1L), BP = c(1L, 
1L, 0L)), class = "data.frame", row.names = c(NA, -3L))

answered Jul 11, 2022 at 12:27

jay.sf

76.3k8 gold badges66 silver badges132 bronze badges

Comments

B. Christian Kamgang · Accepted Answer · 2022-07-11 14:28:53Z

0

Another way to solves your problem:

df$CVD = with(df, pmin(Stroke, MI, BP)) 

  Stroke MI BP CVD
1      0  1  1   0
2      1  1  1   1
3      1  1  0   0

# or
library(data.table)

setDT(df)[, CVD := pmin(Stroke, MI, BP)]

# or
library(dplyr)

df = df %>% 
  mutate(CVD = pmin(Stroke, MI, BP))

edited Jul 11, 2022 at 14:28

answered Jul 11, 2022 at 12:35

B. Christian Kamgang

6,5348 silver badges11 bronze badges

Collectives™ on Stack Overflow

Merging binary variables in R to a new variable

8 Answers 8

2 Comments

Comments

Comments

1 Comment

Comments

1 Comment

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

8 Answers 8

2 Comments

Comments

Comments

1 Comment

Comments

1 Comment

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related