2

I have a data frame of the form below:

ID <- c(1, 2, 3, 4, 5)
Type1 <- c("A", "", "A", "B", "C")
Count1 <- c(40, NA, 10, 5, 100)
Type2 <- c("D", "", "", "C", "D")
Count2 <- c(5, NA, NA, 30, 5)
Type3 <- c("E", "", "", "D", "")
Count3 <- c(10, NA, NA, 5, NA)
df <- data.frame(ID, Type1, Count1, Type2, Count2, Type3, Count3)

I would like to sum the values in the "Count" columns IF they are of the same "Type". I.e., if Type1, Type2, or Type3 match, sum the corresponding value in Count1, Count2, and Count3.

Ideally, I could get an output of the form below:

Type <- c("A", "B", "C", "D", "E")
n <- c(2, 1, 2, 3, 1)
Total <- c(50, 5, 130, 15, 10)

result <- data.frame(Type, n, Total)

I was able to achieve this using the following code, but it's quite clunky. I'm sure there is a more elegant method!

df1 <- data.frame(Type1, Count1)
df2 <- data.frame(Type2, Count2)
df3 <- data.frame(Type3, Count3)

colnames(df1) <- c("Type", "Count")
colnames(df2) <- c("Type", "Count")
colnames(df3) <- c("Type", "Count")

df_all <- rbind(df1, df2, df3)

result <- df_all %>% group_by(Type) %>% 
     summarize(num = n(),
               total = sum(Count))

3 Answers 3

3

This is easily done if your data are in long format. You can get it into long format using pivot_longer() from the tidyr package. This will work for any number of columns (e.g. Type1, Type2, Type3, ... and Count1, Count2, Count3, ...)

library(tidyr)
library(dplyr)

df |>
  pivot_longer(cols = matches("^(Count|Type)"),
               names_pattern = "(^\\w+)(\\d+)", names_to = c(".value", NA)) |>
  summarize(n = n(), Total = sum(Count, na.rm = TRUE), .by = Type) |>
  filter(nzchar(Type))
#   Type      n Total
#   <chr> <int> <dbl>
# 1 A         2    50
# 2 D         3    15
# 3 E         1    10
# 4 B         1     5
# 5 C         2   130
Sign up to request clarification or add additional context in comments.

Comments

0

Here is a quick way by reorganizing the data frame slightly before using summarize.

result <- bind_rows(select(df, Type = Type1, Count = Count1),
                    select(df, Type = Type2, Count = Count2),
                    select(df, Type = Type3, Count = Count3)) %>%

  group_by(Type) %>%
  summarize(n = n(),
            Total = sum(Count))

result
> # A tibble: 6 × 3
>   Type      n Total
>   <chr> <int> <dbl>
> 1 ""        6    NA
> 2 "A"       2    50
> 3 "B"       1     5
> 4 "C"       2   130
> 5 "D"       3    15
> 6 "E"       1    10

You can also convert to a data frame and remove the first (blank) column:

result <- result %>% filter(Type != "") %>%
  as.data.frame()

result
>   Type n Total
> 1    A 2    50
> 2    B 1     5
> 3    C 2   130
> 4    D 3    15
> 5    E 1    10

Comments

0

reshape long, then aggregate.

> df |> 
+   reshape(idvar='ID', varying=2:7, sep='', direction='long') |> 
+   aggregate(Count ~ Type, \(x) c(n=length(x), Total=sum(x))) |> 
+   do.call(what='data.frame')  ## needed to get rid of matrix column
  Type Count.n Count.Total
1    A       2          50
2    B       1           5
3    C       2         130
4    D       3          15
5    E       1          10

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.