gtsummary with "top 10"

Question

Is there a way to only have the 10 most frequent entries for a gtsummary tbl_summary with categorical data?

I'm currently using the following code

library(forcats)
library(dplyr)
library(magrittr)

table<- fct_count(df$name, sort = T, prop = T)%>%
  slice_head(n = 10)
table$p<- round(table$p, digits = 3)
table$p<- table$p * 100
table %<>% rename(Organism = f,`%` = p)
table

Which produces a beautiful table in the console:

But ideally I would have it in a gtsummary table (because that's what the rest of my report is using). I can make the tbl_summary no problems, I just can't figure out how to limit to only the 10 most common organisms, and I haven't seen this asked/answered anywhere.

example dataset

library(AMR)
df<- data.table::as.data.table(example_isolates)
df$name<- mo_name(df$mo)

Edward · Accepted Answer · 2024-10-24 07:41:30Z

2

You could determine the top 10 beforehand and then convert the name to a factor with "Other" as the last level.

df <- data.table::as.data.table(example_isolates)
df$name<- mo_name(df$mo)
df

top10 <- names(rev(tail(sort(table(df$name)), 10)))

df %>%
  mutate(name=factor(case_when(name %in% top10~name,
                        .default="(Other)"),
                     levels=c(top10, "(Other)"))) %>%
  tbl_summary(include=name)

answered Oct 24, 2024 at 7:41

Edward

22.2k3 gold badges18 silver badges37 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

thothal · Accepted Answer · 2024-10-24 07:50:14Z

2

As tbl_summary does not provide such a functionality out of the box, you could first lump all levels which are not in the top 10 into a special category and then remove this very entry via remove_row_type:

remove_me <- "(Remove)"
df <- df %>% 
  mutate(name2 = fct_lump($name, 10, other_level = remove_me))

tbl_summary(df, include = name2, 
            sort = all_categorical(FALSE) ~ "frequency") %>%
  remove_row_type(name2, type = "level", level_value = remove_me)

Personally, I would maybe even include the lumped factor in the table (labeled "(Other)") at the end.

other_category <- "(Other)"
df <- df %>% 
  mutate(name3 = fct_lump(name, 10, other_level = other_category) %>%
                    fct_infreq() %>%
                    fct_relevel(other_category, after = Inf))

tbl_summary(df, include = name3)

edited Oct 24, 2024 at 7:50

answered Oct 24, 2024 at 7:14

thothal

20.6k4 gold badges44 silver badges88 bronze badges

2 Comments

Edward Over a year ago

The percentages are not based on the original data.

thothal Over a year ago

Good point, I remove solution 1.

Friede · Accepted Answer · 2024-10-24 12:01:32Z

2

df0$name[!df0$name %in% names(head(sort(table(df0$name), TRUE), 10))] = "Other"
library(gtsummary)
tbl_summary(data.frame(name = df0$name), 
            sort = all_categorical() ~ "frequency") #optional

giving

answered Oct 24, 2024 at 12:01

Friede

11.6k2 gold badges14 silver badges32 bronze badges

Collectives™ on Stack Overflow

gtsummary with "top 10"

3 Answers 3

Comments

2 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related