0

I have two datasets, which I want to first join and then filter based on conditions, for each group (id, code). For each group, if the end_date is earlier than date, I want to simply filter for the latest end_date. Otherwise, I want to filter for the rows where date is between two columns, start_date and end_date.

I have coded this using dplyr and it works - see below.

left_join(df, df_match, by='id') %>% 
group_by(id, code) %>%
mutate(is.max = max(end_date) < date) %>%
filter(case_when(
  is.max == T ~ end_date == max(end_date),
  is.max == F ~ date >= start_date & date <= end_date
))

However, this code is very slow for my 1+ million row datasets. I am curious if it's possible to achieve the same thing using data.table, which is usually much faster?

1
  • Interesting question but, please, provide a minimal reproducible example including the expected result. As the question currently is written, it is unclear which column is originating from which data.frame. To find clever answers it is necessary to get the full picture and also to have some data for testing. Commented Oct 5, 2021 at 16:10

1 Answer 1

2

Can't test this without data but data.table translation of dplyr code would be -

library(data.table)

setDT(df)
setDT(df_match)

res <- merge(df, df_match, by = 'id')

res[, .SD[if(max(end_date) < date) end_date == max(end_date) else 
  date >= start_date & date <= end_date],  .(id, code)]
Sign up to request clarification or add additional context in comments.

2 Comments

Thank you. I'm getting an error, though, that In if (max(end_date) < date) end_date == max(end_date) else date >= ... : the condition has length > 1 and only the first element will be used.
Try wrapping the condition in any i.e if(any(max(end_date) < date, na.rm = TRUE))

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.