R- iterating over variables names using a loop or function

Question

I want to loop over variables within a data frame either using a for loop or function in R. I have coded the following (which doesn't work):

y <- c(0,0,1,1,0,1,0,1,1,1)
var1 <- c("a","a","a","b","b","b","c","c","c","c")
var2 <- c("m","m","n","n","n","n","o","o","o","m")

mydata <- data.frame(y,var1,var2)

myfunction <- function(v){
  regressionresult <- lm(y ~ v, data = mydata)
  summary(regressionresult) 
}
myfunction("var1")

When I try running this, I get the error message:

Error in model.frame.default(formula = y ~ v, data = mydata, drop.unused.levels = TRUE) : variable lengths differ (found for 'v')

I don't think this is a problem with the data, but with how I refer to the variable name because the following code produces the desired regression results (for one variable that I wanted to loop over):

regressionresult <- lm(y ~ var1, data = mydata) summary(regressionresult)

How can I fix the function, or put the variables names in the loop?

[I also tried to loop over the variables names, but had a similar problem as with the function:

for(v in c("var1","var2")){
  regressionresult <- lm(y ~ v, data = mydata)
  summary(regressionresult)  
}

When running this loop, it produces the error:

Error in model.frame.default(formula = y ~ v, data = mydata, drop.unused.levels = TRUE) : 
  variable lengths differ (found for 'v')

Thanks for your help!

replace your lm line with regressionresult <- lm(y ~ get(v), data = mydata) — Ronak Shah
– Ronak Shah, Commented Jun 5, 2018 at 5:31

akrun · Accepted Answer · 2018-06-05 05:43:18Z

0

We can use paste to create the formula to pass it on the lm

myfunction <- function(v){
  regressionresult <- lm(paste0('y ~', v), data = mydata)
  summary(regressionresult) 
}
out1 <- myfunction("var1")

Or use glue::glue

myfunction <- function(v){
  regressionresult <- lm(glue::glue('y ~ {v}'), data = mydata)
  summary(regressionresult) 
 }
myfunction("var1")

edited Jun 5, 2018 at 5:43

answered Jun 5, 2018 at 5:38

akrun

891k38 gold badges590 silver badges700 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

cderv · Accepted Answer · 2018-06-05 06:12:50Z

You can use functions in the tidyverse to work with tidy data and applying model to different formulas.

y <- c(0,0,1,1,0,1,0,1,1,1)
var1 <- c("a","a","a","b","b","b","c","c","c","c")
var2 <- c("m","m","n","n","n","n","o","o","o","m")

library(tidyverse)
mydata <- data_frame(y,var1,var2)

res <- mydata %>%
  # get data in long format - tidy format
  gather("var_type", "value", -y) %>%
  # we want one model per var_type
  nest(-var_type) %>%
  # apply lm on each data
  mutate(
    regressionresult = map(data, ~lm(y ~ value, data = .x))
  )
res
#> # A tibble: 2 x 3
#>   var_type data              regressionresult
#>   <chr>    <list>            <list>          
#> 1 var1     <tibble [10 x 2]> <S3: lm>        
#> 2 var2     <tibble [10 x 2]> <S3: lm>
summary(res$regressionresult[[1]])
#> 
#> Call:
#> lm(formula = y ~ value, data = .x)
#> 
#> Residuals:
#>     Min      1Q  Median      3Q     Max 
#> -0.7500 -0.3333  0.2500  0.3125  0.6667 
#> 
#> Coefficients:
#>             Estimate Std. Error t value Pr(>|t|)
#> (Intercept)   0.3333     0.3150   1.058    0.325
#> valueb        0.3333     0.4454   0.748    0.479
#> valuec        0.4167     0.4167   1.000    0.351
#> 
#> Residual standard error: 0.5455 on 7 degrees of freedom
#> Multiple R-squared:  0.1319, Adjusted R-squared:  -0.1161 
#> F-statistic: 0.532 on 2 and 7 DF,  p-value: 0.6094

Broom package can help you work with the result then

library(broom)
#> Warning: le package 'broom' a été compilé avec la version R 3.4.4
res <- res %>%
  mutate(tidy_summary = map(regressionresult, broom::tidy))
res         
#> # A tibble: 2 x 4
#>   var_type data              regressionresult tidy_summary        
#>   <chr>    <list>            <list>           <list>              
#> 1 var1     <tibble [10 x 2]> <S3: lm>         <data.frame [3 x 5]>
#> 2 var2     <tibble [10 x 2]> <S3: lm>         <data.frame [3 x 5]>

You can get one of the summary

res$tidy_summary[[1]]
#>          term  estimate std.error statistic   p.value
#> 1 (Intercept) 0.3333333 0.3149704 1.0583005 0.3250657
#> 2      valueb 0.3333333 0.4454354 0.7483315 0.4786436
#> 3      valuec 0.4166667 0.4166667 1.0000000 0.3506167

or unnest to get a data.frame to work with.

res %>% 
  unnest(tidy_summary)
#> # A tibble: 6 x 6
#>   var_type term        estimate std.error statistic p.value
#>   <chr>    <chr>          <dbl>     <dbl>     <dbl>   <dbl>
#> 1 var1     (Intercept)    0.333     0.315     1.06    0.325
#> 2 var1     valueb         0.333     0.445     0.748   0.479
#> 3 var1     valuec         0.417     0.417     1.000   0.351
#> 4 var2     (Intercept)    0.333     0.315     1.06    0.325
#> 5 var2     valuen         0.417     0.417     1       0.351
#> 6 var2     valueo         0.333     0.445     0.748   0.479

Functions of interest are nest and unnest from [tidyr][http://tidyr.tidyverse.org/) that allow to create list columns easily, map from purrr that allows to iterate over a list and apply a function (here lm) and tidy from broom package that offers functions to tidy results from models (summary results, predict results, ...)

Not used here but know that modelr package helps for doing pipelines when modeling.

Collectives™ on Stack Overflow

R- iterating over variables names using a loop or function

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related