0

I am trying to determine if multiple dates in one data frame are within multiple date ranges from another data frame. The dates and date ranges should be compared within each ID. I'd then like to update the data from the first data frame with information from the second data frame. Both data frames can potentially have 0 to multiple records for each ID. For example, df1 might look like this:

UID1 ID Date
1    1  05/12/10
2    1  07/25/11
3    1  07/31/12
4    2  11/04/03
5    2  10/06/04
6    3  10/07/08
7    3  06/16/12

While df2 might look like this (note ID=2 has no records in df2):

UID2 ID StartDate   EndDate
1    1  07/22/09    09/13/09
2    1  03/19/10    11/29/10
3    1  05/09/11    09/04/11
4    3  05/18/12    08/15/12
5    3  01/15/13    04/21/13

I would like to end up with a new df1 that looks like this:

UID1 ID Date        UID2  InRange DaysSinceStart
1    1  05/12/10    2     TRUE    54
2    1  07/25/11    3     TRUE    77
3    1  07/31/12    NA    FALSE   NA
4    2  11/04/03    NA    FALSE   NA
5    2  10/06/04    NA    FALSE   NA
6    3  10/07/08    NA    FALSE   NA
7    3  06/16/12    4     TRUE    29

Suggestions?

1
  • Wondering if there is a way to do this using dplyr and/or the tidyverse? Commented Jun 10, 2020 at 15:06

1 Answer 1

2

suggestion to use data.table. explanation inline.

data:

dt1 <- fread("
UID1 ID Date
1    1  05/12/10
2    1  07/25/11
3    1  07/31/12
4    2  11/04/03
5    2  10/06/04
6    3  10/07/08
7    3  06/16/12
")[, Date:=as.Date(Date, "%m/%d/%y")]

cols <- c("StartDate", "EndDate")
dt2 <- fread("
UID2 ID StartDate   EndDate
1    1  07/22/09    09/13/09
    2    1  03/19/10    11/29/10
    3    1  05/09/11    09/04/11
    4    3  05/18/12    08/15/12
    5    3  01/15/13    04/21/13
")[, (cols) := lapply(.SD, function(x) as.Date(x, "%m/%d/%y")), .SDcols=cols]

working starts here:

#left join dt1 with dt2
dt <- dt2[dt1, on="ID", allow.cartesian=TRUE]

#check date range, get unique row 
res <- dt[, {
        if (!all(is.na(StartDate <= Date & Date <= EndDate)) &&
                any(StartDate <= Date & Date <= EndDate)) {

            #case where Date within a range
            chosen <- StartDate <= Date & Date <= EndDate
            list(UID2=UID2[chosen], StartDate=StartDate[chosen])

        } else {
            list(UID2=NA_integer_, StartDate=as.Date(NA))
        }
    }, by=c("UID1","ID","Date")]

#count DaysSinceStart
res[, ':=' (InRange=!is.na(UID2),
    DaysSinceStart=as.numeric(Date - StartDate))][,
        StartDate:=NULL]
res
Sign up to request clarification or add additional context in comments.

1 Comment

Thanks! I'll try that out with my larger dataset.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.