0

I have a dataset with numeric and factor variables. I want to do one page with numeric and other with factor var. First of all, i select factor var with his index.

My df is IRIS dataset.

df<-iris
df$y<-sample(0:1,nrow(iris),replace=TRUE)
fact<-colnames(df)[sapply(df,is.factor)]
index_fact<-which(names(df)%in%fact)

Then i calculate rest of it (numerics)

nm<-ncol(df)-length(fact)

Next step is create loop

i_F=1
i_N=1
list_plotN<- list()
list_plotF<- list()

for (i in 1:length(df)){
  plot <- ggplot(df,aes(x=df[,i],color=y,fill=y))+xlab(names(df)[i]) 

  if (is.factor(df[,i])){
    p_factor<-plot+geom_bar()
    list_plotF[[i_F]]<-p_factor
    i_F=i_F+1
  }else{
    p_numeric <- plot+geom_histogram()
    list_plotN[[i_N]]<-p_numeric
    i_N=i_N+1
  }
}

When i see list_plotF and list_plot_N,it didn't well. It always have same vars. i don't know what i'm doing wrong.

thanks!!!

2
  • 1
    I cannot see any value for y in your example Commented Jan 15, 2019 at 17:47
  • This might help stackoverflow.com/a/50383146/786542 Commented Jan 15, 2019 at 18:02

2 Answers 2

1

I don't really follow your for loop code all that well. But from what I see it seems to be saving the last plot in every loop you make. I've reconstructed what I think you need using lapply. I generally prefer lapply to for loops whenever I can.

Lapply takes a list of values and a function and applies that function to every value. you can define your function separately like I have so everything looks cleaner. Then you just mention the function in the lapply command.

In our case the list is a list of columns from your dataframe df. The function it applies first creates our base plot. Then it does a quick check to see if the column it is looking at is a factor.. If it's a factor it creates a bar graph, else it creates a histogram.

histOrBar <- function(var) {
  basePlot <- ggplot(df, aes_string(var))
  if ( is.factor(df[[var]]) ) {
    basePlot + geom_bar()  
  } else {
    basePlot + geom_histogram()
  }
}

loDFs <- lapply(colnames(df), histOrBar)
Sign up to request clarification or add additional context in comments.

2 Comments

is there possible to pass df like argument of apply? @Sahir Moosvi
There is a set of apply functions. the base apply function can take a dataframe.
1

Consider passing column names with aes_string to better align x with df:

for (i in 1:length(df)){
    plot <- ggplot(df, aes_string(x=names(df)[i], color="y", fill="y")) + 
              xlab(names(df)[i]) 
    ...
}

To demonstrate the problem using aes() and solution using aes_string() in OP's context, consider the following random data frame with columns of different data types: factor, char, int, num, bool, date.

Data

library(ggplot2)

set.seed(1152019)
alpha <- c(LETTERS, letters, c(0:9))
data_tools <- c("sas", "stata", "spss", "python", "r", "julia")

random_df <- data.frame(
  group = sample(data_tools, 500, replace=TRUE),
  int = as.numeric(sample(1:15, 500, replace=TRUE)),
  num = rnorm(500),
  char = replicate(500, paste(sample(LETTERS[1:2], 3, replace=TRUE), collapse="")),
  bool = as.numeric(sample(c(TRUE, FALSE), 500, replace=TRUE)),
  date = as.Date(sample(as.integer(as.Date('2019-01-01', origin='1970-01-01')):as.integer(Sys.Date()), 
                        500, replace=TRUE), origin='1970-01-01')
)

Graph

fact <- colnames(random_df)[sapply(random_df,is.factor)]
index_fact <- which(names(random_df) %in% fact)

i_F=1
i_N=1
list_plotN <- list()
list_plotF <- list()
plot <- NULL

for (i in 1:length(random_df)){
  # aes() VERSION
  #plot <- ggplot(random_df, aes(x=random_df[,i], color=group, fill=group)) +
  #  xlab(names(random_df)[i]) 

  # aes_string() VERSION
  plot <- ggplot(random_df, aes_string(x=names(random_df)[i], color="group", fill="group")) +
    xlab(names(random_df)[i]) 

  if (is.factor(random_df[,i])){
    p_factor <- plot + geom_bar()
    list_plotF[[i_F]] <- p_factor
    i_F=i_F+1
  }else{
    p_numeric <- plot + geom_histogram()
    list_plotN[[i_N]] <- p_numeric
    i_N=i_N+1
  }
}

Problem (using aes() where graph outputs DO NOT change according to type)

Problem Plots


Solution (using aes_string() where graphs DO change according to type)

Solution Plots

9 Comments

Really? What error do you get? Works great on my end.
Come to think of it, your problem is unclear. It always have same vars. ... what does this mean? Without aes_string, the x-axis does not change between plots which I assumed is your same vars issue.
If you see list_plotF (expected graphs for factor vars) and list_plotN (expected graphs numerical vars), you could check both list are same. even plot is a list with all plots.
I edited showing the differences in using aes() and aes_string() in your context. I believe iris may not be a good example since all columns except species are numeric. See my random data (seeded for reproducibility) of columns of all types: char, num, int, bool, and date. See the problem and solution outputs.
What IDE are you using? RStudio? RGui? Rscript at command line? How are you viewing the plots?
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.