dplyr summarise multiple columns and output multiple dataframes

Question

I want to be able to summarise multiple columns separately and have a separate dataframe output for each summary. Right now, I'm doing it manually:

Example:

manufacturer = mpg %>% 
  select(manufacturer) %>% 
  group_by(manufacturer) %>% 
  summarise(
    count = n()
  )

model = mpg %>% 
  select(model) %>% 
  group_by(model) %>% 
  summarise(
    count = n()
  )

## etc. for each column of mpg.

Is there a way to do this automatically in some kind of a loop? I want the dataframe names to be the column names.

akrun · Accepted Answer · 2022-09-24 11:56:33Z

2

We may loop over the column names

library(dplyr)
library(purrr)
lst1 <- map(setNames(names(mpg), names(mpg)),  
  ~ mpg %>% 
      select(all_of(.x)) %>% 
      group_by(across(all_of(.x))) %>%
      summarise(count = n()) )

It is better to keep it in a list. If we want different objects, use list2env

list2env(lst1, .GlobalEnv)

answered Sep 24, 2022 at 11:56

akrun

891k38 gold badges590 silver badges700 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Maël · Accepted Answer · 2022-09-24 11:56:49Z

2

You just need count here. Put in a loop (using imap) over all columns:

library(tidyverse)
imap(mpg, ~ {nm1 <- .y
  count(data.frame(x = .x), x, name = "count") %>% 
    rename_with(~ nm1, 1)})

Then to put the data frames of your list into your global environment, use list2env.

answered Sep 24, 2022 at 11:56

Maël

53k6 gold badges47 silver badges85 bronze badges

Comments

Ronak Shah · Accepted Answer · 2022-09-24 12:58:28Z

Another option is to get the data in long format using pivot_longer and count each value in each column. However, this would require to change all the column values to character. If needed as separate dataframe you may use group_split to split one dataframe into list of dataframes.

library(dplyr)
library(tidyr)

mpg %>%
  mutate(across(.fns = as.character)) %>%
  pivot_longer(cols = everything()) %>%
  count(name, value, name = "count") %>%
  group_split(name, .keep = FALSE)

[[1]]
# A tibble: 7 × 2
  value      count
  <chr>      <int>
1 2seater        5
2 compact       47
3 midsize       41
#4 minivan       11
#5 pickup        33
#6 subcompact    35
#7 suv           62

#[[2]]
# A tibble: 21 × 2
#   value count
#   <chr> <int>
# 1 11       20
# 2 12        8
# 3 13       21
# 4 14       19
# 5 15       24
#...
#...

As others have already pointed out it is better to keep data in a list than in smaller individual dataframes.

Collectives™ on Stack Overflow

dplyr summarise multiple columns and output multiple dataframes

3 Answers 3

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related