4

I would like to process data frame through dplyr and ggplot using column names in form of string. Here is my code

library(ggplot2)
library(dplyr)
my_df <- data.frame(var_1 = sample(c('a', 'b', 'c'), 1000, replace = TRUE),
                    var_2 = sample(c('d', 'e', 'f'), 1000, replace = TRUE))

name_list = c('var_1', 'var_2')

for(el in name_list){
  pdf(paste(el, '.pdf', sep =''))
    test <- my_df %>% group_by(el) %>% summarize(count = n())
    ggplot(data = test, aes(x = el, y = count)) + geom_bar(stat='identity')
  dev.off()
}

The above code obviously does not work. So I tried different things like UQ and as.name. UQ creates column with extra quotes and ggplot does not understand it with aes_string. Any suggestions?

I can use for (el in names(my_df)) with filtering, but would prefer to work with strings.

UPDATE Here are detailed messages/errors that I got:

for(el in name_list){
  pdf(paste(el, '.pdf', sep =''))
    test <- my_df %>% group_by(!!el) %>% summarize(count = n())
    ggplot(data = test, aes_string(x = el, y = 'count')) + geom_bar(stat='identity')
  dev.off()
}

The above code generate empty files.

for(el in name_list){
  pdf(paste(el, '.pdf', sep =''))
    test <- my_df %>% group_by(UQ(el)) %>% summarize(count = n())
    ggplot(data = test, aes_string(x = el, y = 'count')) + geom_bar(stat='identity')
  dev.off()
}

The above code also generates empty files

for(el in name_list){
  pdf(paste(el, '.pdf', sep =''))
    test <- my_df %>% group_by(as.name(el)) %>% summarize(count = n())
    ggplot(data = test, aes_string(x = el, y = 'count')) + geom_bar(stat='identity')
  dev.off()
}

produces

Error in mutate_impl(.data, dots) : 
  Column `as.name(el)` is of unsupported type symbol
2

2 Answers 2

4

You need to UQ (or !!) the name/symbol. For example

for(el in name_list){
  pdf(paste(el, '.pdf', sep =''))
  test <- my_df %>% group_by(UQ(as.name(el))) %>% summarize(count = n())
  print(ggplot(data = test, aes_string(x = el, y = 'count')) + geom_bar(stat='identity'))
  dev.off()
}
Sign up to request clarification or add additional context in comments.

1 Comment

Oh. You need to print() the ggplot object inside a loop with this method. Didn't think about that problem because I missed that part.
2

I made two changes to your code:

  1. To "group by" variable in dplyr use group_by_ instead of group_by;
  2. To call variable in ggplot2 use aes_string or get(variable);

I also added minor changes (e.g. ggsave to save plots).

library(ggplot2)
library(dplyr)
my_df <- data.frame(var_1 = sample(c('a', 'b', 'c'), 1000, replace = TRUE),
                    var_2 = sample(c('d', 'e', 'f'), 1000, replace = TRUE))

name_list = c('var_1', 'var_2')

for(el in name_list){
    p <- my_df %>% 
         group_by_(el) %>% 
         summarize(count = n()) %>%
         ggplot(aes(x = get(el), y = count)) +
             geom_bar(stat = "identity")
    ggsave(paste0(el, ".pdf"), p)
}

2 Comments

ggsave made all the difference, but without ggsave it does not work
The reason ggsave is needed for this to work is that get() is evaluated lazily. You need ggsave inside the loop to make sure get() pulls the correct string. I found that if I use aes_string instead of get() then things are fine. I.e.: ggplot(aes_string(x = el, y = "count")) ...

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.