2

I am trying to calculate the mean, median, min, max across all variables across the grouping Site using the summarize function. In my code, I replace NA with 0, but I am also open to utilizing na.rm=TRUE instead if it easy to incorporate.

I keep getting the following error message and cannot figure it out...

Error: Problem with `summarise()` input `..2`. i `..2 = list(mean, median, min, max)`. x `..2` must be size 6 or 1, not 4. i An earlier column had size 6. i The error occurred in group 1: Site = 1.

Below is my data and code:

Dataset Reprex

data = structure(list(Site = c(7, 1, 7, 7, 1, 1, 7, 1, 6, 1, 1), OS_days = c(264, 
208, 184, 145, 131, 116, 82, 74, 76, 82, 68), ster_days = c(241, 
135, 184, NA, 85, 106, NA, NA, NA, NA, 69), pct_ster = c(0.912878787878788, 
0.649038461538462, 1, NA, 0.648854961832061, 0.913793103448276, 
NA, NA, NA, NA, 1.01470588235294), first_ster_days = c(28, 72, 
1, NA, 42, 1, NA, NA, NA, NA, 1), tot_bev_days = c(1, 13, NA, 
NA, NA, 75, NA, NA, NA, NA, NA), pct_bev = c(0.00378787878787879, 
0.0625, NA, NA, NA, 0.646551724137931, NA, NA, NA, NA, NA), first_bev_days = c(48, 
86, NA, NA, NA, 22, NA, NA, NA, NA, NA), SPD = structure(c(1219.86, 
1107, 1508, 442.74, 524.61, 1733.76, 2079.77, 443.44, NA, 601.8, 
1621.3), label = "Measurement Number 1 mm")), row.names = c(NA, 
-11L), class = c("tbl_df", "tbl", "data.frame"))
knitr::kable(data, digits = 3)


| Site| OS_days| ster_days| pct_ster| first_ster_days| tot_bev_days| pct_bev| first_bev_days|     SPD|
|----:|-------:|---------:|--------:|---------------:|------------:|-------:|--------------:|-------:|
|    7|     264|       241|    0.913|              28|            1|   0.004|             48| 1219.86|
|    1|     208|       135|    0.649|              72|           13|   0.062|             86| 1107.00|
|    7|     184|       184|    1.000|               1|           NA|      NA|             NA| 1508.00|
|    7|     145|        NA|       NA|              NA|           NA|      NA|             NA|  442.74|
|    1|     131|        85|    0.649|              42|           NA|      NA|             NA|  524.61|
|    1|     116|       106|    0.914|               1|           75|   0.647|             22| 1733.76|
|    7|      82|        NA|       NA|              NA|           NA|      NA|             NA| 2079.77|
|    1|      74|        NA|       NA|              NA|           NA|      NA|             NA|  443.44|
|    6|      76|        NA|       NA|              NA|           NA|      NA|             NA|      NA|
|    1|      82|        NA|       NA|              NA|           NA|      NA|             NA|  601.80|
|    1|      68|        69|    1.015|               1|           NA|      NA|             NA| 1621.30|

Code

data %>%
  replace(is.na(.), 0) %>%
  group_by(Site) %>%
  dplyr::summarise(across(c(OS_days, ster_days, pct_ster, first_ster_days, tot_bev_days, pct_bev, first_bev_days, SPD)), list(mean, median, min, max)) 

2 Answers 2

3

The bracket for across ) was closed too early

library(dplyr)
data %>%
  replace(is.na(.), 0) %>% 
  group_by(Site) %>%
  dplyr::summarise(across(c(OS_days, ster_days, pct_ster, 
      first_ster_days, tot_bev_days, pct_bev, first_bev_days, SPD), 
        list(mean, median, min, max)))

-output

# A tibble: 3 x 33
   Site OS_days_1 OS_days_2 OS_days_3 OS_days_4 ster_days_1 ster_days_2 ster_days_3 ster_days_4 pct_ster_1 pct_ster_2 pct_ster_3 pct_ster_4 first_ster_days_1
  <dbl>     <dbl>     <dbl>     <dbl>     <dbl>       <dbl>       <dbl>       <dbl>       <dbl>      <dbl>      <dbl>      <dbl>      <dbl>             <dbl>
1     1      113.       99         68       208        65.8          77           0         135      0.538      0.649          0       1.01             19.3 
2     6       76        76         76        76         0             0           0           0      0          0              0       0                 0   
3     7      169.      164.        82       264       106.           92           0         241      0.478      0.456          0       1                 7.25
# … with 19 more variables: first_ster_days_2 <dbl>, first_ster_days_3 <dbl>, first_ster_days_4 <dbl>, tot_bev_days_1 <dbl>, tot_bev_days_2 <dbl>,
#   tot_bev_days_3 <dbl>, tot_bev_days_4 <dbl>, pct_bev_1 <dbl>, pct_bev_2 <dbl>, pct_bev_3 <dbl>, pct_bev_4 <dbl>, first_bev_days_1 <dbl>,
#   first_bev_days_2 <dbl>, first_bev_days_3 <dbl>, first_bev_days_4 <dbl>, SPD_1 <dbl>, SPD_2 <dbl>, SPD_3 <dbl>, SPD_4 <dbl>
Sign up to request clarification or add additional context in comments.

2 Comments

Lord almighty... can you tell its been a tough week and its only Tuesday! Thanks so much!
,@akrun: Dear master you please check my answer. why I get wrong numbers. Thank you.
2

With many thanks to akrun guiding me. Here is a base R solution.

# function with all functions to apply
multi.fun <- function(x) {
    c(mean = mean(x), median = median(x), min = min(x), max = max(x))
}

# replace NA with 0
data[is.na(data)] <- 0 

# group by Site and apply function multi.fun
my_list <- lapply(split(data, data$Site), function(x) sapply(x, multi.fun))

# convert to df
do.call(rbind, my_list)

Output:

       Site  OS_days ster_days  pct_ster first_ster_days tot_bev_days      pct_bev first_bev_days      SPD
mean      1 113.1667  65.83333 0.5377321        19.33333     14.66667 0.1181752874             18 1005.318
median    1  99.0000  77.00000 0.6489467         1.00000      0.00000 0.0000000000              0  854.400
min       1  68.0000   0.00000 0.0000000         0.00000      0.00000 0.0000000000              0  443.440
max       1 208.0000 135.00000 1.0147059        72.00000     75.00000 0.6465517241             86 1733.760
mean      6  76.0000   0.00000 0.0000000         0.00000      0.00000 0.0000000000              0    0.000
median    6  76.0000   0.00000 0.0000000         0.00000      0.00000 0.0000000000              0    0.000
min       6  76.0000   0.00000 0.0000000         0.00000      0.00000 0.0000000000              0    0.000
max       6  76.0000   0.00000 0.0000000         0.00000      0.00000 0.0000000000              0    0.000
mean      7 168.7500 106.25000 0.4782197         7.25000      0.25000 0.0009469697             12 1312.592
median    7 164.5000  92.00000 0.4564394         0.50000      0.00000 0.0000000000              0 1363.930
min       7  82.0000   0.00000 0.0000000         0.00000      0.00000 0.0000000000              0  442.740
max       7 264.0000 241.00000 1.0000000        28.00000      1.00000 0.0037878788             48 2079.770

5 Comments

,@akrun: Dear master you please check my answer. why I get wrong numbers. Thank you.
I don't think you took into account the grouping by site. Random but do you happen to be a UNC Tarheels fan (based on the username)?
As Adam menntioned, there is a group by operation i.e. lapply(split(data, data$Site), function(x) sapply(x, multi.fun))
Thank you both very much. @akrun special thank to you. Now I got a list with 3 elements how to get a dataframe (still learning base R fundamentals). Thank you!
You can use do.call(rbind, .. or do.call(cbind, .

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.