3

I am trying to summarize data across multiple columns automatically if at all possible rather than writing code for each column independently. I would like to summarize this:

Patch    Size       Achmil  Aciarv   Aegpod Agrcap
  A       10            0       1      1      0
  B       2             1       0      0      0
  C       2             1       0      0      0
  D       2             1       0      0      0

into this

Species   Presence      MaxSize    MeanSize    Count
Achmil       0             10         10         1
Achmil       1             2           2         3
Aciarv       0             2           2         3
Aciarv       1             10          10        1

I know that I can individually run group_by and summarize for each column

achmil<-group_by(LimitArea, Achmil) %>%
  summarise(SumA=mean(Size))

but is there no way to automatically run this for each column for each presence and absence using some sort of loop? Any help is appreciated.

1
  • did you try to use lapply. Commented Mar 26, 2018 at 10:05

2 Answers 2

2

Perhaps we need to gather in to long format and then do the summarise

library(tidyverse)
gather(df1, Species, Presence, Achmil:Agrcap) %>% 
      group_by(Species, Presence)  %>% 
      summarise( MaxSize = max(Size), MeanSize = mean(Size), Count = n())
# A tibble: 7 x 5
# Groups: Species [?]
#  Species Presence MaxSize MeanSize Count
#  <chr>      <int>   <dbl>    <dbl> <int>
#1 Achmil         0   10.0     10.0      1
#2 Achmil         1    2.00     2.00     3
#3 Aciarv         0    2.00     2.00     3
#4 Aciarv         1   10.0     10.0      1
#5 Aegpod         0    2.00     2.00     3
#6 Aegpod         1   10.0     10.0      1
#7 Agrcap         0   10.0      4.00     4

In the newer version of dplyr/tidyr, we can use pivot_longer

df1 %>% 
   pivot_longer(cols = Achmil:Agrcap, names_to = "Species",
              values_to = "Presence") %>%
   group_by(Species, Presence) %>% 
   summarise(MaxSize = max(Size), MeanSize = mean(Size), Count = n())
Sign up to request clarification or add additional context in comments.

1 Comment

Of course! Thank you so much. Was rather silly of me to forget to first gather the data out of long form. Was too busy thinking of some loop... quite embarrassing
0

Here another solution using aggregate (and reshape2::melt())

library(reshape2)

df = melt(df[,2:ncol(df)], "Size")
aggregate(. ~ `variable`+`value`, data = df, 
          FUN = function(x) c(max = max(x), mean = mean(x), count = length(x)))

    variable value Size.max Size.mean Size.count
1   Achmil     0       10        10          1
2   Aciarv     0        2         2          3
3   Aegpod     0        2         2          3
4   Agrcap     0       10         4          4
5   Achmil     1        2         2          3
6   Aciarv     1       10        10          1
7   Aegpod     1       10        10          1

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.