Suppose I have a list of dataframes l. All dataframes are guaranteed to have the same shape and contain the same columns.
I would like to combine the columns of those dataframes with a column-specific element-wise operation, defined in a list of functions comb_funcs, and generate a new dataframe.
For the sake of simplicity, let's assume the list has only 2 dataframes with only 2 columns:
df1 <- tribble(
~n_students, ~age,
100, 16,
130, 15,
110, 14
)
df2 <- tribble(
~n_students, ~age,
150, 13,
60, 12,
75, 11
)
l <- list(df1, df2)
comb_funcs <- list(
n_students = sum,
age = median
)
In this example, the expected output is a new dataframe that contains 2 columns: n_students as the element-wise sum of the n_students columns, and age as the element-wise median of the age columns.
Here is what I tried:
comb_dfs <- function(l, comb_funcs) {
fin_df <- l[[1]]
l <- l[2:length(l)]
for (df in l) {
for (var in names(comb_funcs)) {
fin_df[var] <- mapply(
function(x, y) comb_funcs[[var]](c(x, y)),
fin_df[[var]],
df[[var]]
)
}
return(fin_df)
}
}
In the example above, this returns the expected output:
> comb_dfs(l, comb_funcs)
# A tibble: 3 × 2
n_students age
<dbl> <dbl>
1 250 14.5
2 190 13.5
3 185 12.5
But my function seems cumbersome.
Some cleaner code that uses the tidyverse?
Please notice that the MWE is written with 2 dataframes and 2 columns. But in real life there might be many dataframes with many columns. Therefore, the input of our algorithm must be l (the list of dataframes) and comb_funcs (a named list where names are the columns to process and values are the functions to use).
All dataframes in the list are guaranteed to have the same shape and the same columns. names(comb_funcs) is guaranteed to be a subset of those columns.
ageandn_studentsare elements of all input data frames, and you are interested insum(n_students)andmedian(age), why do you believe suggested solutions fail? In other words, why so many people suggest answers which do not fit the requirements given in your question (in your opinion)?