Using nested for loops to create a data frame in R

Question

I was looking to find a way to use two for loops to create a specific data frame in R. I got started on a function but was having some difficulty with it. The first for loop would loop through the names of a list of data frames and the second for loop would loop through the columns of each data frame and give back the mean. The output would then give back a data frame with each row containing the means of the columns for one of the data frames. Here's some dummy data:

first<- data.frame(b = factor(c("Hi", "Hi","Hi","Hi")), y = c(8, 3, 9, 9),
               z = c(1, 1, 1, 2))
second<- data.frame(b = factor(c("Med", "Med", "Med", "Med")),y = c(3, 2, 6, 5),
                z = c(1, 11, 4, 3))

third<- list(first,second)
fourth<- c("first","second")
names(third)<- c(fourth)
fifth<- c("y","z")

Here's the function I was working on:

testr<- function(arg1,arg2){
  a<- list()
  for(i in 1:length(arg1)){
   b<- (third[[arg1[i]]])
    for(j in 1:length(arg2)){
      c<- mean(b[[arg2[[j]]]])
      a[[j]]<-c
    }
  }
  df<- do.call("cbind",a)
  df<-as.data.frame(df)
  df$name<- arg1
  return(df)
}

My goal would be this result:

testr(fourth,fifth)

    V1   V2  name
1 7.25 1.25 first
2    4 4.75 second

But instead I get this:

testr(fourth,fifth)

 Error in `$<-.data.frame`(`*tmp*`, "name", value = c("first", "second" : 
  replacement has 2 rows, data has 1

Any help would be greatly appreciated!

aggregate(. ~ b, data = rbind(first, second), mean) gives something which resembles your desired output. But perhaps I don't understand the full complexity of your problem. — Henrik
– Henrik, Commented Jan 31, 2016 at 23:16

Gopala · Accepted Answer · 2016-01-31 23:15:33Z

1

Assuming you have many such data frames as first and second and a list of such data frames as follows, you can use dplyr to get the desired result as follows:

library(dplyr)
l <- list(first, second)
df <- do.call(rbind, l)
df %>% group_by(b) %>% summarise_each(funs(mean))

Output is:

Source: local data frame [2 x 3]

       b     y     z
  (fctr) (dbl) (dbl)
1     Hi  7.25  1.25
2    Med  4.00  4.75

answered Jan 31, 2016 at 23:15

Gopala

10.5k7 gold badges48 silver badges85 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Brandon Bertelsen · Accepted Answer · 2016-01-31 23:54:10Z

1

My advice... let's just avoid for loops all together. It looks like you just want the mean of the two columns and the name of the data.frame.

Pick up some skills with dplyr or data.table that make this type of summarization trivial.

library(dplyr)
third %>% 
  bind_rows(.id = "name") %>% 
  group_by(name) %>% 
  summarize(
    V1 = mean(y), 
    V2 = mean(z))

# Source: local data frame [2 x 3]
#
#     name    V1    V2
#    (chr) (dbl) (dbl)
# 1  first  7.25  1.25
# 2 second  4.00  4.75


library(data.table)
dt <- rbindlist(third)
dt[,list(V1 = mean(y),V2 = mean(z)),by = b]
#      b   V1   V2
# 1:  Hi 7.25 1.25
# 2: Med 4.00 4.75

# or as David points out.
dt[, lapply(.SD, mean), by = b]
#      b    y    z
# 1:  Hi 7.25 1.25
# 2: Med 4.00 4.75

edited Jan 31, 2016 at 23:54

answered Jan 31, 2016 at 23:13

Brandon Bertelsen

44.8k37 gold badges170 silver badges262 bronze badges

1 Comment

David Arenburg Over a year ago

You can do dt[, lapply(.SD, mean), by = b]

Collectives™ on Stack Overflow

Using nested for loops to create a data frame in R

2 Answers 2

Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related