I would like to write an R function that adds interaction terms to a formula.
For instance, the function takes the formula mpg ~ cyl + gear + disp, the treatment variable cyl and a character vector of control variables c("gear","disp") and returns mpg ~ cyl + cyl * gear + cyl * disp.
Ideally, the function should return an error if one of the control variables is not in the formula, or if the interaction term is already in the formula.
I came up with the following, which seems to work but uses string manipulation rather than first principles.
I think this makes it more prone to errors and slower.
How can I re-write it to use first principles?
#' Add interaction terms in a formula
#'
#' @param form A formula
#' @param treat The treatment variable (string)
#' @param controls A character vector of control variables
#' @return A formula with interaction terms added between `treat` and each variable in `controls`
#' @export
#' @examples
#' reformulas_addints(mpg ~ cyl + gear, "cyl", c("gear"))
#' reformulas_addints(mpg ~ cyl + gear + disp, "cyl", c("gear", "disp"))
#' reformulas_addints(mpg ~ cyl + gear, "cyl", c("gears"))
#' reformulas_addints(mpg ~ cyl + cyl*gear, "cyl", c("gear"))
reformulas_addints <- function(form, treat, controls) {
form_str <- as.character(form)
for (control in controls) {
if(!stringr::str_detect(form_str, control)){
stop(paste0("The variable '", control, "' is not in the formula."))
}
patt <- paste0(r"(\s*)",treat,r"(\s*\*\s*)",control, r"(\s*)")
if(stringr::str_detect(form_str, patt)){
stop(paste0("The interaction '",treat, " * ", control, "' is already in the formula."))
}
form_str <- stringr::str_replace(
form_str,
paste0("\\b", control, "\\b"),
paste0(treat, " * ", control)
)
}
return(as.formula(form_str))
}
Here are some examples with expected output:
# Expected outputs
reformulas_addints(mpg ~ cyl + gear, "cyl", c("gear"))
# mpg ~ cyl + cyl * gear
# also acceptable
# mpg ~ cyl + gear + cyl:gear
reformulas_addints(mpg ~ cyl + gear + disp, "cyl", c("gear", "disp"))
# mpg ~ cyl + cyl * gear + cyl * disp
# also acceptable
# mpg ~ cyl + gear + disp + cyl:gear + cyl:disp
reformulas_addints(mpg ~ cyl + gear + disp + hp, "cyl", c("gear", "disp"))
# mpg ~ cyl + cyl * gear + cyl * disp + hp
# also acceptable
# mpg ~ cyl + gear + disp + cyl:gear + cyl:disp + hp
# Notice that `hp` is _not_ interacted
reformulas_addints(mpg ~ cyl + gear, "cyl", c("gears"))
# Error: The variable 'gears' is not in the formula.
reformulas_addints(mpg ~ cyl + cyl*gear, "cyl", c("gear"))
# Error: The interaction 'cyl * gear' is already in the formula.
:as well as with*.updateon that model with the new formula that contains the interactions. In this way when I change the main formula, also the formula with the interaction is changed. I guess I could write a function that generate the formula, but would overshoot it?