I apologize if this is a duplicate or a bit confusing - I've searched all around SO but can't seem to apply find what I'm trying to accomplish. I haven't used functions/loops extensively, especially writing from scratch, so I'm not sure if the error is from the function (likely) or from the construct of the data. The basic flow as follows:
Dummy data set - grouping, type, rate, years, months
I'm running lm formula on the data set by grouping with this bit:
coef_models <- test_coef %>% group_by(Grouping) %>% do(model = lm(rate ~ years + months, data = .))
The result of the above gives me intercepts and coefficients for the variables - what I'm trying to accomplish next (and failing) is for all the coefficients for the estimates that are negative, drop that component out of the equation and rerun the lm with just the positive coefficient. So for example a grouping of states, if the years coefficient is negative, I would want to run lm(rate ~ months, data = . with in the formula.
To get there, with plyr/broom, I'm taking the results and putting them into a data frame:
#removed lines with negative coefficients
library(dplyr)
library(broom)
coef_output_test <- as.data.frame(coef_models %>% tidy(model))
coef_output_test$Grouping <- as.character(coef_output_test$Grouping)
#drop these coefficients and rerun
coef_output_test_rerun <- coef_output_test[!(coef_output_test$estimate >= 0),]
From here, I'm trying to rerun the groupings with issues without the negative variable from the initial run. Because the variables will vary, some instances will be years dropping out, some will be months, I need to pass through the correct column to use. I think this is where I'm getting hung up:
lm_test_rerun_out <- data.frame(grouping=character()
, '(intercept)'=double()
, term=character()
, estimate=double()
, stringsAsFactors=FALSE)
lm_test_rerun <- function(r) {
y = coef_output_test_rerun$Grouping
x = coef_output_test_rerun$term
for (i in 2:nrow(coef_output_test_rerun)){
lm_test_rerun_out <- test_coef %>% group_by(Grouping["y"]) %>% do(model = lm(rate ~ x, data = .))
}
}
lm_test_rerun(coef_output_test_rerun)
I get this error:
variable lengths differ (found for 'x')
The output for function should be something like this dummy output:
Grouping, Term, (intercept), Estimate
Sports, Years, 0.56, 0.0430
States, Months, 0.67, 0.340
I'm surely not fluent in R, and I'm sure the parts above that do work could be done more efficiently, but the output of the function should be the grouping and x variable used, along with the intercept and estimate for each. Ultimately I'll be taking that output and appending back to the original 'coef_models' - but I can't get past this part for now.
EDIT: sample test_coef set
Grouping Drilldown Years Months Rate
Sports Basketball 10 23 0.42
Sports Soccer 13 18 0.75
Sports Football 9 5 0.83
Sports Golf 13 17 0.59
States CA 13 20 0.85
States TX 14 9 0.43
States AK 14 10 0.63
States AR 10 5 0.60
States ID 18 2 0.22
Countries US 8 19 0.89
Countries CA 9 19 0.86
Countries UK 2 15 0.64
Countries MX 21 15 0.19
Countries AR 8 11 0.62
dput(test_coef)and post the results here to make it reproducible?lm_test_rerunisrbut you never use it in the function. Maybe you meantlm_test_rerun <- function(coef_output_test_rerun){...instead.