I have a set of data roughly like this (more data in dput & desired results below):
id date u v
<chr> <date> <chr> <int>
1 a 2019-05-14 NA 0
2 a 2018-06-29 u 1
3 b 2020-12-02 u 1
4 b 2017-08-16 NA 1
5 b 2016-04-07 NA 0
6 c 2018-05-22 u 1
7 c 2018-05-22 u 1
8 e 2019-03-06 u 1
9 e 2019-03-06 NA 1
I am trying to create a new variable pr identifying, for each id, whether when u == u, there is an equal or earlier date where v == 1 within that id group (regardless of the value of u).
I know how generally to create a new variable based on in-group conditions:
library(dplyr)
x %>%
group_by(id) %>%
mutate(pr = case_when())
But I can't figure out how to compare the other dates within the group to the date corresponding to u and how to identify the presence of v == 1 not including the u row I am using as a reference. And u will always have v == 1.
Expected output is:
id date u v pr
<chr> <date> <chr> <int> <int>
1 a 2019-05-14 NA 0 NA
2 a 2018-06-29 u 1 0
3 b 2020-12-02 u 1 1
4 b 2017-08-16 NA 1 NA
5 b 2016-04-07 NA 0 NA
6 c 2018-05-22 u 1 1
7 c 2018-05-22 u 1 1
8 e 2019-03-06 u 1 1
9 e 2019-03-06 NA 1 NA
10 f 2020-10-20 u 1 0
11 f 2019-01-25 NA 0 NA
12 h 2020-02-24 NA 0 NA
13 h 2018-10-15 u 1 0
14 h 2018-03-07 NA 0 NA
15 i 2021-02-02 u 1 1
16 i 2020-11-19 NA 1 NA
17 i 2020-11-19 NA 1 NA
18 j 2019-02-11 u 1 1
19 j 2017-06-26 u 1 0
20 k 2018-12-13 u 1 0
21 k 2017-07-18 NA 0 NA
22 l 2018-05-08 u 1 1
23 l 2018-02-15 NA 0 NA
24 l 2018-02-15 u 1 0
25 l 2017-11-07 NA 0 NA
26 l 2015-09-10 NA 0 NA
The format of the variables isn't ideal; if there's any way for me to help clean it up let me know. Actual data is sensitive so I'm approximating.
> dput(x)
structure(list(id = c("a", "a", "b", "b", "b", "c", "c", "e",
"e", "f", "f", "h", "h", "h", "i", "i", "i", "j", "j", "k", "k",
"l", "l", "l", "l", "l"), date = structure(c(18030, 17711, 18598,
17394, 16898, 17673, 17673, 17961, 17961, 18555, 17921, 18316,
17819, 17597, 18660, 18585, 18585, 17938, 17343, 17878, 17365,
17659, 17577, 17577, 17477, 16688), class = "Date"), u = c(NA,
"u", "u", NA, NA, "u", "u", "u", NA, "u", NA, NA, "u", NA, "u",
NA, NA, "u", "u", "u", NA, "u", NA, "u", NA, NA), v = c(0L, 1L,
1L, 1L, 0L, 1L, 1L, 1L, 1L, 1L, 0L, 0L, 1L, 0L, 1L, 1L, 1L, 1L,
1L, 1L, 0L, 1L, 0L, 1L, 0L, 0L), pr = c(NA, 0L, 1L, NA, NA, 1L,
1L, 1L, NA, 0L, NA, NA, 0L, NA, 1L, NA, NA, 1L, 0L, 0L, NA, 1L,
NA, 0L, NA, NA)), row.names = c(NA, -26L), class = c("tbl_df",
"tbl", "data.frame"))