0

I am trying to create a new variable (v2) based on a pattern of numerical responses to another variable (v1). The dataset I am working with is in long format and ordered by visit. I have tried grouping by the 'id' variable and using various combinations of 'summarise' in dplyr, but cannot seem to figure this out. Below is an example of what I would like to achieve.

    id     visit    v1     v2
   <dbl>   <int>  <dbl>  <int>
 1 10001     1      0      1
 2 10001     2      0      1
 3 10002     1      0      2
 4 10002     2      1      2
 5 10003     1      1      3
 6 10003     2      0      3

The value of 1 for v2 should reflect a response pattern of 0 across two visits for id 10001, 2 reflects a response pattern of 0/1, and so on.

Thank you in advance for the help!

1
  • 2
    How many patterns do you have? 0/0->1, 0/1->2, 1/0->3 what else do you have? Commented Nov 1, 2021 at 20:15

2 Answers 2

1

Another way is:

dat %>%
    group_by(id) %>%
    mutate(v2 = c("00" = 1, "01" = 2, "10" = 3, "11" = 4)[paste(v1, collapse = "")])
# A tibble: 6 x 4
# Groups:   id [3]
     id visit    v1    v2
  <int> <int> <int> <dbl>
1 10001     1     0     1
2 10001     2     0     1
3 10002     1     0     2
4 10002     2     1     2
5 10003     1     1     3
6 10003     2     0     3
Sign up to request clarification or add additional context in comments.

Comments

1

Assumption:

  • within an id, we always have exactly 2 rows

base R

ave(dat$v1, dat$id, FUN = function(z) {
  if (length(z) != 2) return(NA_integer_)
  switch(paste(z, collapse = ""),
    "00" = 1L, 
    "01" = 2L, 
    "10" = 3L, 
    "11" = 4L, 
    NA_integer_)
})
# [1] 1 1 2 2 3 3

dplyr

library(dplyr)
dat %>%
  group_by(id) %>%
  mutate(v2 = if (n() != 2) NA_integer_ else case_when(
    all(v1 == c(0L, 0L)) ~ 1L, 
    all(v1 == c(0L, 1L)) ~ 2L, 
    all(v1 == c(1L, 0L)) ~ 3L, 
    all(v1 == c(1L, 1L)) ~ 4L, 
    TRUE ~ NA_integer_)
  ) %>%
  ungroup()
# # A tibble: 6 x 4
#      id visit    v1    v2
#   <int> <int> <int> <int>
# 1 10001     1     0     1
# 2 10001     2     0     1
# 3 10002     1     0     2
# 4 10002     2     1     2
# 5 10003     1     1     3
# 6 10003     2     0     3

Data

dat <- structure(list(id = c(10001L, 10001L, 10002L, 10002L, 10003L, 10003L), visit = c(1L, 2L, 1L, 2L, 1L, 2L), v1 = c(0L, 0L, 0L, 1L, 1L, 0L), v2 = c(1L, 1L, 2L, 2L, 3L, 3L)), class = "data.frame", row.names = c("1", "2", "3", "4", "5", "6"))

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.