1

I have a set of data roughly like this (more data in dput & desired results below):

   id    date       u         v
   <chr> <date>     <chr> <int>
 1 a     2019-05-14 NA        0
 2 a     2018-06-29 u         1
 3 b     2020-12-02 u         1
 4 b     2017-08-16 NA        1
 5 b     2016-04-07 NA        0
 6 c     2018-05-22 u         1
 7 c     2018-05-22 u         1
 8 e     2019-03-06 u         1
 9 e     2019-03-06 NA        1

I am trying to create a new variable pr identifying, for each id, whether when u == u, there is an equal or earlier date where v == 1 within that id group (regardless of the value of u).

I know how generally to create a new variable based on in-group conditions:

library(dplyr)
x %>% 
  group_by(id) %>% 
  mutate(pr = case_when())

But I can't figure out how to compare the other dates within the group to the date corresponding to u and how to identify the presence of v == 1 not including the u row I am using as a reference. And u will always have v == 1.

Expected output is:

   id    date       u         v    pr
   <chr> <date>     <chr> <int> <int>
 1 a     2019-05-14 NA        0    NA
 2 a     2018-06-29 u         1     0
 3 b     2020-12-02 u         1     1
 4 b     2017-08-16 NA        1    NA
 5 b     2016-04-07 NA        0    NA
 6 c     2018-05-22 u         1     1
 7 c     2018-05-22 u         1     1
 8 e     2019-03-06 u         1     1
 9 e     2019-03-06 NA        1    NA
10 f     2020-10-20 u         1     0
11 f     2019-01-25 NA        0    NA
12 h     2020-02-24 NA        0    NA
13 h     2018-10-15 u         1     0
14 h     2018-03-07 NA        0    NA
15 i     2021-02-02 u         1     1
16 i     2020-11-19 NA        1    NA
17 i     2020-11-19 NA        1    NA
18 j     2019-02-11 u         1     1
19 j     2017-06-26 u         1     0
20 k     2018-12-13 u         1     0
21 k     2017-07-18 NA        0    NA
22 l     2018-05-08 u         1     1
23 l     2018-02-15 NA        0    NA
24 l     2018-02-15 u         1     0
25 l     2017-11-07 NA        0    NA
26 l     2015-09-10 NA        0    NA

The format of the variables isn't ideal; if there's any way for me to help clean it up let me know. Actual data is sensitive so I'm approximating.

> dput(x)
structure(list(id = c("a", "a", "b", "b", "b", "c", "c", "e", 
"e", "f", "f", "h", "h", "h", "i", "i", "i", "j", "j", "k", "k", 
"l", "l", "l", "l", "l"), date = structure(c(18030, 17711, 18598, 
17394, 16898, 17673, 17673, 17961, 17961, 18555, 17921, 18316, 
17819, 17597, 18660, 18585, 18585, 17938, 17343, 17878, 17365, 
17659, 17577, 17577, 17477, 16688), class = "Date"), u = c(NA, 
"u", "u", NA, NA, "u", "u", "u", NA, "u", NA, NA, "u", NA, "u", 
NA, NA, "u", "u", "u", NA, "u", NA, "u", NA, NA), v = c(0L, 1L, 
1L, 1L, 0L, 1L, 1L, 1L, 1L, 1L, 0L, 0L, 1L, 0L, 1L, 1L, 1L, 1L, 
1L, 1L, 0L, 1L, 0L, 1L, 0L, 0L), pr = c(NA, 0L, 1L, NA, NA, 1L, 
1L, 1L, NA, 0L, NA, NA, 0L, NA, 1L, NA, NA, 1L, 0L, 0L, NA, 1L, 
NA, 0L, NA, NA)), row.names = c(NA, -26L), class = c("tbl_df", 
"tbl", "data.frame"))

1 Answer 1

1

We may create a function

library(dplyr)
library(purrr)
f1 <- function(u, v, date) {
      # create a variable with only 0s
      tmp <- rep(0, n())
      # create logical vectors based on 'u' value and 1 in `v`
      i1 <- u %in% "u"
      i2 <- v %in% 1
      # loop over the subset of date where v values are 1
      # check whether `all` of the dates are greater than or equal to
      # subset of date where values are 'u' in `u` 
      # and if the number of v values are greater than 1
      # assign it to the 'tmp' where v values are 1 and return the 'tmp' 
      # after assigning NA where u values are NA
      tmp[i2] <- +(purrr::map_lgl(date[i2], 
           ~  all(.x >= date[i1])) & sum(i2) > 1)
      tmp[is.na(u)] <- NA
      tmp
      
      }

and apply it after grouping

x1 <- x %>%
   group_by(id) %>%
   mutate(prnew = f1(u, v, date)) %>% 
   ungroup 
> all.equal(x1$pr, x1$prnew)
[1] TRUE
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.