Continual error with summarize function dplyr

Question

I am trying to calculate the mean, median, min, max across all variables across the grouping Site using the summarize function. In my code, I replace NA with 0, but I am also open to utilizing na.rm=TRUE instead if it easy to incorporate.

I keep getting the following error message and cannot figure it out...

Error: Problem with `summarise()` input `..2`. i `..2 = list(mean, median, min, max)`. x `..2` must be size 6 or 1, not 4. i An earlier column had size 6. i The error occurred in group 1: Site = 1.

Below is my data and code:

Dataset Reprex

data = structure(list(Site = c(7, 1, 7, 7, 1, 1, 7, 1, 6, 1, 1), OS_days = c(264, 
208, 184, 145, 131, 116, 82, 74, 76, 82, 68), ster_days = c(241, 
135, 184, NA, 85, 106, NA, NA, NA, NA, 69), pct_ster = c(0.912878787878788, 
0.649038461538462, 1, NA, 0.648854961832061, 0.913793103448276, 
NA, NA, NA, NA, 1.01470588235294), first_ster_days = c(28, 72, 
1, NA, 42, 1, NA, NA, NA, NA, 1), tot_bev_days = c(1, 13, NA, 
NA, NA, 75, NA, NA, NA, NA, NA), pct_bev = c(0.00378787878787879, 
0.0625, NA, NA, NA, 0.646551724137931, NA, NA, NA, NA, NA), first_bev_days = c(48, 
86, NA, NA, NA, 22, NA, NA, NA, NA, NA), SPD = structure(c(1219.86, 
1107, 1508, 442.74, 524.61, 1733.76, 2079.77, 443.44, NA, 601.8, 
1621.3), label = "Measurement Number 1 mm")), row.names = c(NA, 
-11L), class = c("tbl_df", "tbl", "data.frame"))

knitr::kable(data, digits = 3)


| Site| OS_days| ster_days| pct_ster| first_ster_days| tot_bev_days| pct_bev| first_bev_days|     SPD|
|----:|-------:|---------:|--------:|---------------:|------------:|-------:|--------------:|-------:|
|    7|     264|       241|    0.913|              28|            1|   0.004|             48| 1219.86|
|    1|     208|       135|    0.649|              72|           13|   0.062|             86| 1107.00|
|    7|     184|       184|    1.000|               1|           NA|      NA|             NA| 1508.00|
|    7|     145|        NA|       NA|              NA|           NA|      NA|             NA|  442.74|
|    1|     131|        85|    0.649|              42|           NA|      NA|             NA|  524.61|
|    1|     116|       106|    0.914|               1|           75|   0.647|             22| 1733.76|
|    7|      82|        NA|       NA|              NA|           NA|      NA|             NA| 2079.77|
|    1|      74|        NA|       NA|              NA|           NA|      NA|             NA|  443.44|
|    6|      76|        NA|       NA|              NA|           NA|      NA|             NA|      NA|
|    1|      82|        NA|       NA|              NA|           NA|      NA|             NA|  601.80|
|    1|      68|        69|    1.015|               1|           NA|      NA|             NA| 1621.30|

Code

data %>%
  replace(is.na(.), 0) %>%
  group_by(Site) %>%
  dplyr::summarise(across(c(OS_days, ster_days, pct_ster, first_ster_days, tot_bev_days, pct_bev, first_bev_days, SPD)), list(mean, median, min, max))

akrun · Accepted Answer · 2021-08-10 19:12:51Z

3

The bracket for across ) was closed too early

library(dplyr)
data %>%
  replace(is.na(.), 0) %>% 
  group_by(Site) %>%
  dplyr::summarise(across(c(OS_days, ster_days, pct_ster, 
      first_ster_days, tot_bev_days, pct_bev, first_bev_days, SPD), 
        list(mean, median, min, max)))

-output

# A tibble: 3 x 33
   Site OS_days_1 OS_days_2 OS_days_3 OS_days_4 ster_days_1 ster_days_2 ster_days_3 ster_days_4 pct_ster_1 pct_ster_2 pct_ster_3 pct_ster_4 first_ster_days_1
  <dbl>     <dbl>     <dbl>     <dbl>     <dbl>       <dbl>       <dbl>       <dbl>       <dbl>      <dbl>      <dbl>      <dbl>      <dbl>             <dbl>
1     1      113.       99         68       208        65.8          77           0         135      0.538      0.649          0       1.01             19.3 
2     6       76        76         76        76         0             0           0           0      0          0              0       0                 0   
3     7      169.      164.        82       264       106.           92           0         241      0.478      0.456          0       1                 7.25
# … with 19 more variables: first_ster_days_2 <dbl>, first_ster_days_3 <dbl>, first_ster_days_4 <dbl>, tot_bev_days_1 <dbl>, tot_bev_days_2 <dbl>,
#   tot_bev_days_3 <dbl>, tot_bev_days_4 <dbl>, pct_bev_1 <dbl>, pct_bev_2 <dbl>, pct_bev_3 <dbl>, pct_bev_4 <dbl>, first_bev_days_1 <dbl>,
#   first_bev_days_2 <dbl>, first_bev_days_3 <dbl>, first_bev_days_4 <dbl>, SPD_1 <dbl>, SPD_2 <dbl>, SPD_3 <dbl>, SPD_4 <dbl>

answered Aug 10, 2021 at 19:12

akrun

891k38 gold badges590 silver badges700 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Adam Over a year ago

Lord almighty... can you tell its been a tough week and its only Tuesday! Thanks so much!

TarJae Over a year ago

,@akrun: Dear master you please check my answer. why I get wrong numbers. Thank you.

TarJae · Accepted Answer · 2021-08-10 21:11:25Z

2

With many thanks to akrun guiding me. Here is a base R solution.

# function with all functions to apply
multi.fun <- function(x) {
    c(mean = mean(x), median = median(x), min = min(x), max = max(x))
}

# replace NA with 0
data[is.na(data)] <- 0 

# group by Site and apply function multi.fun
my_list <- lapply(split(data, data$Site), function(x) sapply(x, multi.fun))

# convert to df
do.call(rbind, my_list)

Output:

       Site  OS_days ster_days  pct_ster first_ster_days tot_bev_days      pct_bev first_bev_days      SPD
mean      1 113.1667  65.83333 0.5377321        19.33333     14.66667 0.1181752874             18 1005.318
median    1  99.0000  77.00000 0.6489467         1.00000      0.00000 0.0000000000              0  854.400
min       1  68.0000   0.00000 0.0000000         0.00000      0.00000 0.0000000000              0  443.440
max       1 208.0000 135.00000 1.0147059        72.00000     75.00000 0.6465517241             86 1733.760
mean      6  76.0000   0.00000 0.0000000         0.00000      0.00000 0.0000000000              0    0.000
median    6  76.0000   0.00000 0.0000000         0.00000      0.00000 0.0000000000              0    0.000
min       6  76.0000   0.00000 0.0000000         0.00000      0.00000 0.0000000000              0    0.000
max       6  76.0000   0.00000 0.0000000         0.00000      0.00000 0.0000000000              0    0.000
mean      7 168.7500 106.25000 0.4782197         7.25000      0.25000 0.0009469697             12 1312.592
median    7 164.5000  92.00000 0.4564394         0.50000      0.00000 0.0000000000              0 1363.930
min       7  82.0000   0.00000 0.0000000         0.00000      0.00000 0.0000000000              0  442.740
max       7 264.0000 241.00000 1.0000000        28.00000      1.00000 0.0037878788             48 2079.770

edited Aug 10, 2021 at 21:11

answered Aug 10, 2021 at 20:02

TarJae

80.2k6 gold badges30 silver badges94 bronze badges

5 Comments

TarJae Over a year ago

,@akrun: Dear master you please check my answer. why I get wrong numbers. Thank you.

Adam Over a year ago

I don't think you took into account the grouping by site. Random but do you happen to be a UNC Tarheels fan (based on the username)?

akrun Over a year ago

As Adam menntioned, there is a group by operation i.e. lapply(split(data, data$Site), function(x) sapply(x, multi.fun))

TarJae Over a year ago

Thank you both very much. @akrun special thank to you. Now I got a list with 3 elements how to get a dataframe (still learning base R fundamentals). Thank you!

akrun Over a year ago

You can use do.call(rbind, .. or do.call(cbind, .

Collectives™ on Stack Overflow

Continual error with summarize function dplyr

2 Answers 2

2 Comments

5 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

5 Comments

Your Answer

Sign up or log in

Post as a guest

Related