0

I have dataset with multiple timepoints for e.g. col1, col2, col3, col4 with each column having similar categories (A,B,C,D)

|ID  | col1 | col2| col3| col4| ColA
------------------------------
|1   |A     |A    |B    |B
------------------------------
|2   |A     |B    |B    |C
-------------------------------
|3   |B     |B    |C    |C
-------------------------------
|4   |B     |C    |C    |D
-------------------------------
|5   |C     |C    |D    |D
--------------------------------

Now I want to create a new variable ("colA") such that if A is found in any of the col1-col4 then it will be assigned value 1 else value 0. Similar for 1 for B new in "colB", C for "colC", and D for "colD". Here ColA will have 1,1,0,0,0 and ColB will have 1,1,1,1,0 values so on.

I tried multiple options which I found here For eg. (without loop)

df %>%
 mutate(colA=case_when(starts_with("col")=="A") ~ 1, TRUE~0)

or

(with a loop)

for (I in 1:4)  {
  df[,"colA"] <- ifelse(unlist(df[,paste0("col",i)]=="A",1,0)
}

but it did not work. in for loop, the data gets overwritten at the end of the loop, so colA reads as 0,0,0,0,0 as there are no A's in col4. I am using multiple datasets where range of i may vary from 1:2 to 1:4.

structure(list(id = 1:6, col1 = c(age11 = "A", age12 = "A", age13 = "B", 
age14 = "A", age15 = "C", age16 = "B"), col2 = c(age21 = "A", 
age22 = NA, age23 = "C", age24 = "A", age25 = "C", age26 = "B"
), col3 = c(age31 = "B", age32 = "B", age33 = NA, age34 = NA, 
age35 = "C", age36 = "B"), col4 = c(age41 = "B", age42 = NA, 
age43 = NA, age44 = NA, age45 = "D", age46 = "C")), row.names = c(NA, 
-6L), class = c("tbl_df", "tbl", "data.frame"))
2
  • It will be easier for people to help if you can share your data in a way that's easier for other people to load. Can you pls run dput(head(MY_DATA)) and paste the output into the body of your question? That will create a code recipe to make an exact copy of the first 6 rows of your data. Commented Jul 14, 2022 at 19:44
  • Just edited the post Commented Jul 14, 2022 at 20:07

3 Answers 3

1
library(dplyr, warn.conflicts = FALSE)

df %>% 
  mutate(colA = coalesce(if_any(col1:col4, `==`, 'A'), 0))
#> # A tibble: 6 × 6
#>      id col1  col2  col3  col4   colA
#>   <int> <chr> <chr> <chr> <chr> <dbl>
#> 1     1 A     A     B     B         1
#> 2     2 A     <NA>  B     <NA>      1
#> 3     3 B     C     <NA>  <NA>      0
#> 4     4 A     A     <NA>  <NA>      1
#> 5     5 C     C     C     D         0
#> 6     6 B     B     B     C         0

Created on 2022-07-14 by the reprex package (v2.0.1)

Sign up to request clarification or add additional context in comments.

Comments

0
df %>%
   pivot_longer(-ID, names_to = 'value', values_to = 'name') %>%
   pivot_wider(values_fn = length, values_fill = 0, names_prefix = 'col')%>%
   mutate(across(-ID, ~+(.>0)))%>%
   right_join(df, 'ID')

# A tibble: 5 x 9
     ID  colA  colB  colC  colD col1  col2  col3  col4 
  <int> <int> <int> <int> <int> <chr> <chr> <chr> <chr>
1     1     1     1     0     0 A     A     B     B    
2     2     1     1     1     0 A     B     B     C    
3     3     0     1     1     0 B     B     C     C    
4     4     0     1     1     1 B     C     C     D    
5     5     0     0     1     1 C     C     D     D 

Comments

0

in Base R:

cbind(df, +(table(cbind(df['ID'], unlist(df[-1])))>1))

  ID col1 col2 col3 col4 A B C D
1  1    A    A    B    B 1 1 0 0
2  2    A    B    B    C 0 1 0 0
3  3    B    B    C    C 0 1 1 0
4  4    B    C    C    D 0 0 1 0
5  5    C    C    D    D 0 0 1 1

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.