Conditional sum of multiple columns based on multiple (other) columns

Question

I have a data frame of the form below:

ID <- c(1, 2, 3, 4, 5)
Type1 <- c("A", "", "A", "B", "C")
Count1 <- c(40, NA, 10, 5, 100)
Type2 <- c("D", "", "", "C", "D")
Count2 <- c(5, NA, NA, 30, 5)
Type3 <- c("E", "", "", "D", "")
Count3 <- c(10, NA, NA, 5, NA)
df <- data.frame(ID, Type1, Count1, Type2, Count2, Type3, Count3)

I would like to sum the values in the "Count" columns IF they are of the same "Type". I.e., if Type1, Type2, or Type3 match, sum the corresponding value in Count1, Count2, and Count3.

Ideally, I could get an output of the form below:

Type <- c("A", "B", "C", "D", "E")
n <- c(2, 1, 2, 3, 1)
Total <- c(50, 5, 130, 15, 10)

result <- data.frame(Type, n, Total)

I was able to achieve this using the following code, but it's quite clunky. I'm sure there is a more elegant method!

df1 <- data.frame(Type1, Count1)
df2 <- data.frame(Type2, Count2)
df3 <- data.frame(Type3, Count3)

colnames(df1) <- c("Type", "Count")
colnames(df2) <- c("Type", "Count")
colnames(df3) <- c("Type", "Count")

df_all <- rbind(df1, df2, df3)

result <- df_all %>% group_by(Type) %>% 
     summarize(num = n(),
               total = sum(Count))

LMc · Accepted Answer · 2024-06-11 19:02:37Z

3

This is easily done if your data are in long format. You can get it into long format using pivot_longer() from the tidyr package. This will work for any number of columns (e.g. Type1, Type2, Type3, ... and Count1, Count2, Count3, ...)

library(tidyr)
library(dplyr)

df |>
  pivot_longer(cols = matches("^(Count|Type)"),
               names_pattern = "(^\\w+)(\\d+)", names_to = c(".value", NA)) |>
  summarize(n = n(), Total = sum(Count, na.rm = TRUE), .by = Type) |>
  filter(nzchar(Type))
#   Type      n Total
#   <chr> <int> <dbl>
# 1 A         2    50
# 2 D         3    15
# 3 E         1    10
# 4 B         1     5
# 5 C         2   130

edited Jun 11, 2024 at 19:02

answered Jun 11, 2024 at 17:57

LMc

19k4 gold badges41 silver badges54 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

apple · Accepted Answer · 2024-06-11 17:49:48Z

Here is a quick way by reorganizing the data frame slightly before using summarize.

result <- bind_rows(select(df, Type = Type1, Count = Count1),
                    select(df, Type = Type2, Count = Count2),
                    select(df, Type = Type3, Count = Count3)) %>%

  group_by(Type) %>%
  summarize(n = n(),
            Total = sum(Count))

result
> # A tibble: 6 × 3
>   Type      n Total
>   <chr> <int> <dbl>
> 1 ""        6    NA
> 2 "A"       2    50
> 3 "B"       1     5
> 4 "C"       2   130
> 5 "D"       3    15
> 6 "E"       1    10

You can also convert to a data frame and remove the first (blank) column:

result <- result %>% filter(Type != "") %>%
  as.data.frame()

result
>   Type n Total
> 1    A 2    50
> 2    B 1     5
> 3    C 2   130
> 4    D 3    15
> 5    E 1    10

jay.sf · Accepted Answer · 2024-06-12 04:46:26Z

0

reshape long, then aggregate.

> df |> 
+   reshape(idvar='ID', varying=2:7, sep='', direction='long') |> 
+   aggregate(Count ~ Type, \(x) c(n=length(x), Total=sum(x))) |> 
+   do.call(what='data.frame')  ## needed to get rid of matrix column
  Type Count.n Count.Total
1    A       2          50
2    B       1           5
3    C       2         130
4    D       3          15
5    E       1          10

answered Jun 12, 2024 at 4:46

jay.sf

76.3k8 gold badges66 silver badges132 bronze badges

Collectives™ on Stack Overflow

Conditional sum of multiple columns based on multiple (other) columns

3 Answers 3

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related