Summarize data table individually for multiple columns

Question

I am trying to summarize data across multiple columns automatically if at all possible rather than writing code for each column independently. I would like to summarize this:

Patch    Size       Achmil  Aciarv   Aegpod Agrcap
  A       10            0       1      1      0
  B       2             1       0      0      0
  C       2             1       0      0      0
  D       2             1       0      0      0

into this

Species   Presence      MaxSize    MeanSize    Count
Achmil       0             10         10         1
Achmil       1             2           2         3
Aciarv       0             2           2         3
Aciarv       1             10          10        1

I know that I can individually run group_by and summarize for each column

achmil<-group_by(LimitArea, Achmil) %>%
  summarise(SumA=mean(Size))

but is there no way to automatically run this for each column for each presence and absence using some sort of loop? Any help is appreciated.

did you try to use lapply.

user7905871
– user7905871

2018-03-26 10:05:45 +00:00
Commented Mar 26, 2018 at 10:05 — user7905871
– user7905871, Commented Mar 26, 2018 at 10:05

akrun · Accepted Answer · 2020-01-05 21:49:21Z

2

Perhaps we need to gather in to long format and then do the summarise

library(tidyverse)
gather(df1, Species, Presence, Achmil:Agrcap) %>% 
      group_by(Species, Presence)  %>% 
      summarise( MaxSize = max(Size), MeanSize = mean(Size), Count = n())
# A tibble: 7 x 5
# Groups: Species [?]
#  Species Presence MaxSize MeanSize Count
#  <chr>      <int>   <dbl>    <dbl> <int>
#1 Achmil         0   10.0     10.0      1
#2 Achmil         1    2.00     2.00     3
#3 Aciarv         0    2.00     2.00     3
#4 Aciarv         1   10.0     10.0      1
#5 Aegpod         0    2.00     2.00     3
#6 Aegpod         1   10.0     10.0      1
#7 Agrcap         0   10.0      4.00     4

In the newer version of dplyr/tidyr, we can use pivot_longer

df1 %>% 
   pivot_longer(cols = Achmil:Agrcap, names_to = "Species",
              values_to = "Presence") %>%
   group_by(Species, Presence) %>% 
   summarise(MaxSize = max(Size), MeanSize = mean(Size), Count = n())

edited Jan 5, 2020 at 21:49

answered Mar 26, 2018 at 10:06

akrun

891k38 gold badges590 silver badges700 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Kevin Over a year ago

Of course! Thank you so much. Was rather silly of me to forget to first gather the data out of long form. Was too busy thinking of some loop... quite embarrassing

ChrKoenig · Accepted Answer · 2018-03-26 10:40:34Z

0

Here another solution using aggregate (and reshape2::melt())

library(reshape2)

df = melt(df[,2:ncol(df)], "Size")
aggregate(. ~ `variable`+`value`, data = df, 
          FUN = function(x) c(max = max(x), mean = mean(x), count = length(x)))

    variable value Size.max Size.mean Size.count
1   Achmil     0       10        10          1
2   Aciarv     0        2         2          3
3   Aegpod     0        2         2          3
4   Agrcap     0       10         4          4
5   Achmil     1        2         2          3
6   Aciarv     1       10        10          1
7   Aegpod     1       10        10          1

answered Mar 26, 2018 at 10:40

ChrKoenig

9831 gold badge9 silver badges24 bronze badges

Collectives™ on Stack Overflow

Summarize data table individually for multiple columns

2 Answers 2

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related