Conditional filtering in groups using data.table in R

Question

I have two datasets, which I want to first join and then filter based on conditions, for each group (id, code). For each group, if the end_date is earlier than date, I want to simply filter for the latest end_date. Otherwise, I want to filter for the rows where date is between two columns, start_date and end_date.

I have coded this using dplyr and it works - see below.

left_join(df, df_match, by='id') %>% 
group_by(id, code) %>%
mutate(is.max = max(end_date) < date) %>%
filter(case_when(
  is.max == T ~ end_date == max(end_date),
  is.max == F ~ date >= start_date & date <= end_date
))

However, this code is very slow for my 1+ million row datasets. I am curious if it's possible to achieve the same thing using data.table, which is usually much faster?

Interesting question but, please, provide a minimal reproducible example including the expected result. As the question currently is written, it is unclear which column is originating from which data.frame. To find clever answers it is necessary to get the full picture and also to have some data for testing. — Uwe
– Uwe, Commented Oct 5, 2021 at 16:10

Ronak Shah · Accepted Answer · 2021-10-05 11:24:11Z

2

Can't test this without data but data.table translation of dplyr code would be -

library(data.table)

setDT(df)
setDT(df_match)

res <- merge(df, df_match, by = 'id')

res[, .SD[if(max(end_date) < date) end_date == max(end_date) else 
  date >= start_date & date <= end_date],  .(id, code)]

answered Oct 5, 2021 at 11:24

Ronak Shah

391k20 gold badges173 silver badges237 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

fifigoblin Over a year ago

Thank you. I'm getting an error, though, that

In if (max(end_date) < date) end_date == max(end_date) else date >=  ... : the condition has length > 1 and only the first element will be used

.

Ronak Shah Over a year ago

Try wrapping the condition in any i.e if(any(max(end_date) < date, na.rm = TRUE))

Collectives™ on Stack Overflow

Conditional filtering in groups using data.table in R

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related