0

Let's say I make a dummy dataframe with 6 columns with 10 observations:

X <- data.frame(a=1:10, b=11:20, c=21:30, d=31:40, e=41:50, f=51:60)

I need to create a loop that evaluates 3 columns at a time, adding the summed second and third columns and dividing this by the sum of the first column:

 (sum(b)+sum(c))/sum(a) ... (sum(e)+sum(f))/sum(d) ...

I then need to construct a final dataframe from these values. For example using the dummy dataframe above, it would look like:

        value
1.     7.454545
2.     2.84507

I imagine I need to use the next function to iterate within the loop, but I'm fairly lost! Thank you for any help.

2
  • 1
    Do you repeat the values? eg sum(b)+sum(c))/sum(a) then sum(d)+sum(c))/sum(a) or should it be sum(d)+sum(c))/sum(b) Commented Jul 30, 2020 at 16:00
  • Hi Onyambu, no the values don't repeat -- it's every 3 discrete columns. So c+b/a, then e+f/d, and so on. Commented Jul 30, 2020 at 19:37

3 Answers 3

1

You can split your data frame into groups of 3 by creating a vector with rep where each element repeats 3 times. Then with this list of sub data frames, (s)apply the function of summing the second and third columns, adding them, and dividing by the sum of the first column.

out_vec <- 
  sapply(
    split.default(X, rep(1:ncol(X), each = 3, length.out = ncol(X)))
    , function(x) (sum(x[2]) + sum(x[3]))/sum(x[1]))

data.frame(value = out_vec)
#      value
# 1 7.454545
# 2 2.845070

You could also sum all the columns up front before the sapply with colSums, which will be more efficient.

out_vec <- 
  sapply(
    split(colSums(X), rep(1:ncol(X), each = 3, length.out = ncol(X)))
    , function(x) (x[2] + x[3])/x[1])

data.frame(value = out_vec, row.names = NULL)
#      value
# 1 7.454545
# 2 2.845070
Sign up to request clarification or add additional context in comments.

Comments

1

You could use tapply:

tapply(colSums(X), gl(ncol(X)/3, 3), function(x)sum(x[-1])/x[1])
       1        2 
7.454545 2.845070 

Comments

0

Here is an option with tidyverse

library(dplyr) # 1.0.0
library(tidyr)
X %>% 
     summarise(across(.fn = sum)) %>% 
     pivot_longer(everything()) %>% 
     group_by(grp = as.integer(gl(n(), 3, n()))) %>% 
     summarise(value = sum(lead(value)/first(value), na.rm = TRUE)) %>% 
     select(value)
# A tibble: 2 x 1
#  value
#  <dbl>
#1  7.45
#2  2.85

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.