0

I have composed a function that develops histograms using ggplot2 on the numerical columns of a dataframe that will be passed to it. The function stores these plots into a list and then returns the list.

However when I run the function I get the same plot again and again.

My code is the following and I provide also a reproducible example.

hist_of_columns = function(data, class, variables_to_exclude = c()){

    library(ggplot2)
    library(ggthemes)

    data = as.data.frame(data)

    variables_numeric = names(data)[unlist(lapply(data, function(x){is.numeric(x) | is.integer(x)}))]

    variables_not_to_plot = c(class, variables_to_exclude)



    variables_to_plot = setdiff(variables_numeric, variables_not_to_plot)

    indices = match(variables_to_plot, names(data))

    index_of_class = match(class, names(data))

    plots = list()

    for (i in (1 : length(variables_to_plot))){



          p  = ggplot(data, aes(x= data[, indices[i]], color= data[, index_of_class], fill=data[, index_of_class])) +
           geom_histogram(aes(y=..density..), alpha=0.3,
           position="identity", bins = 100)+ theme_economist() +
           geom_density(alpha=.2) + xlab(names(data)[indices[i]]) + labs(fill = class) + guides(color = FALSE)

          name = names(data)[indices[i]]

          plots[[name]] = p
    }

   plots

}


data(mtcars)

mtcars$am = factor(mtcars$am)

data = mtcars

variables_to_exclude = 'mpg'

class = 'am'

plots = hist_of_columns(data, class, variables_to_exclude)

If you check the list plots you will discover that it contains the same plot repeated.

0

2 Answers 2

2

Simply use aes_string to pass string variables into the ggplot() call. Right now, your plot uses different data sources, not aligned with ggplot's data argument. Below x, color, and fill are separate, unrelated vectors though they derive from same source but ggplot does not know that:

ggplot(data, aes(x= data[, indices[i]], color= data[, index_of_class], fill=data[, index_of_class]))

However, with aes_string, passing string names to x, color, and fill will point to data:

ggplot(data, aes_string(x= names(data)[indices[i]], color= class, fill= class))
Sign up to request clarification or add additional context in comments.

Comments

1

Here is strategy using tidyeval that does what you are after:

library(rlang)
library(tidyverse)

hist_of_cols <- function(data, class, drop_vars) {

    # tidyeval overhead
    class_enq <- enquo(class)
    drop_enqs <- enquo(drop_vars)

    data %>%
        group_by(!!class_enq) %>% # keep the 'class' column always
        select(-!!drop_enqs) %>% # drop any 'drop_vars'
        select_if(is.numeric) %>% # keep only numeric columns
        gather("key", "value", -!!class_enq) %>% # go to long form
        split(.$key) %>% # make a list of data frames
        map(~ ggplot(., aes(value, fill = !!class_enq)) + # plot as usual
                geom_histogram() +
                geom_density(alpha = .5) +
                labs(x = unique(.$key)))

}
hist_of_cols(mtcars, am, mpg)

hist_of_cols(mtcars, am, c(mpg, wt))

2 Comments

Your gather() function call has -am as an argument but I would like a general form (mtcars is only an example). Moreover would it be possible to pass class as string -denoting the name of the class variable -- and drop_vars as a vector of strings corresponding to the names of the variables? Finally can you explain why my code is malfunctioning?
Updated the issue in gather() The reason your current solution isn't working is because you are passing a vector via data[, index] to each aes() variable. Because ggplot2 uses lazyeval it is only looking up the actual values of that vector when you finish the loop. So it always sees the same vector value (because i is at its max) and thus you get a list of identical plots.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.