2

I was looking to find a way to use two for loops to create a specific data frame in R. I got started on a function but was having some difficulty with it. The first for loop would loop through the names of a list of data frames and the second for loop would loop through the columns of each data frame and give back the mean. The output would then give back a data frame with each row containing the means of the columns for one of the data frames. Here's some dummy data:

first<- data.frame(b = factor(c("Hi", "Hi","Hi","Hi")), y = c(8, 3, 9, 9),
               z = c(1, 1, 1, 2))
second<- data.frame(b = factor(c("Med", "Med", "Med", "Med")),y = c(3, 2, 6, 5),
                z = c(1, 11, 4, 3))

third<- list(first,second)
fourth<- c("first","second")
names(third)<- c(fourth)
fifth<- c("y","z")

Here's the function I was working on:

testr<- function(arg1,arg2){
  a<- list()
  for(i in 1:length(arg1)){
   b<- (third[[arg1[i]]])
    for(j in 1:length(arg2)){
      c<- mean(b[[arg2[[j]]]])
      a[[j]]<-c
    }
  }
  df<- do.call("cbind",a)
  df<-as.data.frame(df)
  df$name<- arg1
  return(df)
}

My goal would be this result:

testr(fourth,fifth)

    V1   V2  name
1 7.25 1.25 first
2    4 4.75 second

But instead I get this:

testr(fourth,fifth)

 Error in `$<-.data.frame`(`*tmp*`, "name", value = c("first", "second" : 
  replacement has 2 rows, data has 1 

Any help would be greatly appreciated!

1
  • 1
    aggregate(. ~ b, data = rbind(first, second), mean) gives something which resembles your desired output. But perhaps I don't understand the full complexity of your problem. Commented Jan 31, 2016 at 23:16

2 Answers 2

1

Assuming you have many such data frames as first and second and a list of such data frames as follows, you can use dplyr to get the desired result as follows:

library(dplyr)
l <- list(first, second)
df <- do.call(rbind, l)
df %>% group_by(b) %>% summarise_each(funs(mean))

Output is:

Source: local data frame [2 x 3]

       b     y     z
  (fctr) (dbl) (dbl)
1     Hi  7.25  1.25
2    Med  4.00  4.75
Sign up to request clarification or add additional context in comments.

Comments

1

My advice... let's just avoid for loops all together. It looks like you just want the mean of the two columns and the name of the data.frame.

Pick up some skills with dplyr or data.table that make this type of summarization trivial.

library(dplyr)
third %>% 
  bind_rows(.id = "name") %>% 
  group_by(name) %>% 
  summarize(
    V1 = mean(y), 
    V2 = mean(z))

# Source: local data frame [2 x 3]
#
#     name    V1    V2
#    (chr) (dbl) (dbl)
# 1  first  7.25  1.25
# 2 second  4.00  4.75


library(data.table)
dt <- rbindlist(third)
dt[,list(V1 = mean(y),V2 = mean(z)),by = b]
#      b   V1   V2
# 1:  Hi 7.25 1.25
# 2: Med 4.00 4.75

# or as David points out.
dt[, lapply(.SD, mean), by = b]
#      b    y    z
# 1:  Hi 7.25 1.25
# 2: Med 4.00 4.75

1 Comment

You can do dt[, lapply(.SD, mean), by = b]

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.