2

I want to be able to summarise multiple columns separately and have a separate dataframe output for each summary. Right now, I'm doing it manually:

Example:

manufacturer = mpg %>% 
  select(manufacturer) %>% 
  group_by(manufacturer) %>% 
  summarise(
    count = n()
  )

model = mpg %>% 
  select(model) %>% 
  group_by(model) %>% 
  summarise(
    count = n()
  )

## etc. for each column of mpg.

Is there a way to do this automatically in some kind of a loop? I want the dataframe names to be the column names.

3 Answers 3

2

We may loop over the column names

library(dplyr)
library(purrr)
lst1 <- map(setNames(names(mpg), names(mpg)),  
  ~ mpg %>% 
      select(all_of(.x)) %>% 
      group_by(across(all_of(.x))) %>%
      summarise(count = n()) )

It is better to keep it in a list. If we want different objects, use list2env

list2env(lst1, .GlobalEnv)
Sign up to request clarification or add additional context in comments.

Comments

2

You just need count here. Put in a loop (using imap) over all columns:

library(tidyverse)
imap(mpg, ~ {nm1 <- .y
  count(data.frame(x = .x), x, name = "count") %>% 
    rename_with(~ nm1, 1)})

Then to put the data frames of your list into your global environment, use list2env.

Comments

2

Another option is to get the data in long format using pivot_longer and count each value in each column. However, this would require to change all the column values to character. If needed as separate dataframe you may use group_split to split one dataframe into list of dataframes.

library(dplyr)
library(tidyr)

mpg %>%
  mutate(across(.fns = as.character)) %>%
  pivot_longer(cols = everything()) %>%
  count(name, value, name = "count") %>%
  group_split(name, .keep = FALSE)

[[1]]
# A tibble: 7 × 2
  value      count
  <chr>      <int>
1 2seater        5
2 compact       47
3 midsize       41
#4 minivan       11
#5 pickup        33
#6 subcompact    35
#7 suv           62

#[[2]]
# A tibble: 21 × 2
#   value count
#   <chr> <int>
# 1 11       20
# 2 12        8
# 3 13       21
# 4 14       19
# 5 15       24
#...
#...

As others have already pointed out it is better to keep data in a list than in smaller individual dataframes.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.