58

My problem is similar to this one; when I generate plot objects (in this case histograms) in a loop, seems that all of them become overwritten by the most recent plot.

To debug, within the loop, I am printing the index and the generated plot, both of which appear correctly. But when I look at the plots stored in the list, they are all identical except for the label.

(I'm using multiplot to make a composite image, but you get same outcome if you print (myplots[[1]]) through print(myplots[[4]]) one at a time.)

Because I already have an attached dataframe (unlike the poster of the similar problem), I am not sure how to solve the problem.

(btw, column classes are factor in the original dataset I am approximating here, but same problem occurs if they are integer)

Here is a reproducible example:

library(ggplot2)
source("http://peterhaschke.com/Code/multiplot.R") #load multiplot function

#make sample data
col1 <- c(2, 4, 1, 2, 5, 1, 2, 0, 1, 4, 4, 3, 5, 2, 4, 3, 3, 6, 5, 3, 6, 4, 3, 4, 4, 3, 4, 
          2, 4, 3, 3, 5, 3, 5, 5, 0, 0, 3, 3, 6, 5, 4, 4, 1, 3, 3, 2, 0, 5, 3, 6, 6, 2, 3, 
          3, 1, 5, 3, 4, 6)
col2 <- c(2, 4, 4, 0, 4, 4, 4, 4, 1, 4, 4, 3, 5, 0, 4, 5, 3, 6, 5, 3, 6, 4, 4, 2, 4, 4, 4, 
          1, 1, 2, 2, 3, 3, 5, 0, 3, 4, 2, 4, 5, 5, 4, 4, 2, 3, 5, 2, 6, 5, 2, 4, 6, 3, 3, 
          3, 1, 4, 3, 5, 4)
col3 <- c(2, 5, 4, 1, 4, 2, 3, 0, 1, 3, 4, 2, 5, 1, 4, 3, 4, 6, 3, 4, 6, 4, 1, 3, 5, 4, 3, 
          2, 1, 3, 2, 2, 2, 4, 0, 1, 4, 4, 3, 5, 3, 2, 5, 2, 3, 3, 4, 2, 4, 2, 4, 5, 1, 3, 
          3, 3, 4, 3, 5, 4)
col4 <- c(2, 5, 2, 1, 4, 1, 3, 4, 1, 3, 5, 2, 4, 3, 5, 3, 4, 6, 3, 4, 6, 4, 3, 2, 5, 5, 4,
          2, 3, 2, 2, 3, 3, 4, 0, 1, 4, 3, 3, 5, 4, 4, 4, 3, 3, 5, 4, 3, 5, 3, 6, 6, 4, 2, 
          3, 3, 4, 4, 4, 6)
data2 <- data.frame(col1,col2,col3,col4)
data2[,1:4] <- lapply(data2[,1:4], as.factor)
colnames(data2)<- c("A","B","C", "D")

#generate plots
myplots <- list()  # new empty list
for (i in 1:4) {
  p1 <- ggplot(data=data.frame(data2),aes(x=data2[ ,i]))+ 
    geom_histogram(fill="lightgreen") +
    xlab(colnames(data2)[ i])
  print(i)
  print(p1)
  myplots[[i]] <- p1  # add each plot into plot list
}
multiplot(plotlist = myplots, cols = 4)

When I look at a summary of a plot object in the plot list, this is what I see

> summary(myplots[[1]])
data: A, B, C, D [60x4]
mapping:  x = data2[, i]
faceting: facet_null() 
-----------------------------------
geom_histogram: fill = lightgreen 
stat_bin:  
position_stack: (width = NULL, height = NULL)

I think that mapping: x = data2[, i] is the problem, but I am stumped! I can't post images, so you'll need to run my example and look at the graphs if my explanation of the problem is confusing.

Thanks!

2
  • link to multiplot is dead Commented Oct 8, 2019 at 11:49
  • The link works for me. I added a post with the graphs. Commented Apr 9, 2021 at 14:46

5 Answers 5

99

In addition to the other excellent answer, here’s a solution that uses “normal”-looking evaluation rather than eval. Since for loops have no separate variable scope (i.e. they are performed in the current environment) we need to use local to wrap the for block; in addition, we need to make i a local variable — which we can do by re-assigning it to its own name1:

myplots <- vector('list', ncol(data2))

for (i in seq_along(data2)) {
    message(i)
    myplots[[i]] <- local({
        i <- i
        ggplot(data2, aes(x = data2[[i]])) +
            geom_histogram(fill = "lightgreen") +
            xlab(colnames(data2)[i])
    })
}

However, an altogether cleaner way is to forego the for loop entirely and use list functions to build the result. This works in several possible ways. The following is the easiest in my opinion:

plot_data_column = function (data, column) {
    ggplot(data, aes_string(x = column)) +
        geom_histogram(fill = "lightgreen") +
        xlab(column)
}

myplots <- lapply(colnames(data2), plot_data_column, data = data2)

This has several advantages: it’s simpler, and it won’t clutter the environment (with the loop variable i).


1 This might seem confusing: why does i <- i have any effect at all? — Because by performing the assignment we create a new, local variable with the same name as the variable in the outer scope. We could equally have used a different name, e.g. local_i <- i.

Sign up to request clarification or add additional context in comments.

9 Comments

Thank you so much, especially for the lapply version; I wanted to functionalize this but couldn't figure it out, and decided to do (superficially easier, actually horrible) for loop. I figured it was a variable scope problem, I am often fighting them in R!
Both these solutions are unwieldy. For some reason, myplots burgeons to GB's per iteration in my environment. Using both the local method or function/lapply method.
@BigTimeStats Well that’s an issue with having many very big plots, not with either of these solutions. A common solution is to subsample the number of data points you plot (often, such big plots won’t reliably display all individual data points anyway), or to compute summary statistics ahead of plotting (and plot these rather than the raw data). But sometimes neither works. In that case, the only solution is to avoid having multiple plots in memory at once.
@BigTimeStats The estimate in the environment pane is notoriously unreliable. A large part of the reason is that it estimates each object’s size individually but lots of objects in R (particularly data frames) share memory: if you create one data frame from another by modifying one column, then they will share the memory for all remaining columns.
@M-- Actually neither really has any place there. I left the print() in from OP’s code (OP seems to want to display the current plot in each loop iteration!), but I think it’s misplaced here (and so is invisible(), which has no effect here).
|
23

Because of all the quoting of expressions that get passed around, the i that is evaluated at the end of the loop is whatever i happens to be at that time, which is its final value. You can get around this by eval(substitute(ing in the right value during each iteration.

myplots <- list()  # new empty list
for (i in 1:4) {
    p1 <- eval(substitute(
        ggplot(data=data.frame(data2),aes(x=data2[ ,i]))+ 
          geom_histogram(fill="lightgreen") +
          xlab(colnames(data2)[ i])
    ,list(i = i)))
    print(i)
    print(p1)
    myplots[[i]] <- p1  # add each plot into plot list
}
multiplot(plotlist = myplots, cols = 4)

6 Comments

The diagnosis is correct but the solution is somewhat convoluted. It’s easier to capture i in a local context. The problem is that for loops in R have no scope so you need to use local instead: for (i in 1:4) local({i = i; … rest of the loop … }). The self-assignment i = i isn’t by accident — this is actually needed. A different variable name can also be used. Regardless, all this would be unnecessary by using “proper” list functions instead of for, which is frankly a bad language construct in R.
@KonradRudolph local is nice
Ah, I forgot something: if local is used, the assignment to myplots[[i]] needs to use the <<- operator instead of local assignment.
@KonradRudolph any chance you want to add a solution using one of the apply functions. It seems, in that case a substitution or local would also be required? Also, is there a reason that local is better than the substitute way?
I prefer local because it looks like it’s performing standard evaluation (although that’s not the case of course). it hides the evals and substitutes away. In fact neither lapply nor for really needs to capture the variable i if column names are used in the aesthetics. I’ll add an answer.
|
5

I have run the code in the question and in the answer, changing geom_histogram to geom_bar to avoid the error: Error: StatBin requires a continuous x variable.

Here is the code with the visualizations:

Question

#generate plots
myplots <- list()  # new empty list
for (i in 1:4) {
  p1 <- ggplot(data=data.frame(data2),aes(x=data2[ ,i]))+ 
    geom_bar(fill="lightgreen") +
    xlab(colnames(data2)[ i])
  print(i)
  print(p1)
  myplots[[i]] <- p1  # add each plot into plot list
}

multiplot(plotlist = myplots, cols = 4)
#> Loading required package: grid

Answer

myplots <- vector('list', ncol(data2))

for (i in seq_along(data2)) {
    message(i)
    myplots[[i]] <- local({
        i <- i
        p1 <- ggplot(data2, aes(x = data2[[i]])) +
            geom_bar(fill = "lightgreen") +
            xlab(colnames(data2)[i])
        print(p1)
    })
}

multiplot(plotlist = myplots, cols = 4)

Same result using lapply:


plot_data_column = function (data, column) {
    ggplot(data, aes_string(x = column)) +
        geom_bar(fill = "lightgreen") +
        xlab(column)
}

myplots <- lapply(colnames(data2), plot_data_column, data = data2)

multiplot(plotlist = myplots, cols = 4)
#> Loading required package: grid

Created on 2021-04-09 by the reprex package (v0.3.0)

Comments

1

Using lapply works too as x exists within the anonymous function environment (using mtcars as data):

plot <- lapply(seq_len(ncol(mtcars)), FUN = function(x) {
  ggplot(data = mtcars) + 
    geom_line(aes(x = mpg, y = mtcars[ , x]), size = 1.4, color = "midnightblue", inherit.aes = FALSE) +
    labs(x="Date", y="Value", title = "Revisions 1M", subtitle = colnames(mtcars)[x]) +
    theme_wsj() +
    scale_colour_wsj("colors6")
})

Comments

0

Here is another solution:

#generate plots
myplots <- list()  # new empty list
for (col in colnames(data2)) {
  p1 <- ggplot(data=data.frame(data2),aes(x=!!ensym(col)))+ 
    geom_bar(fill="lightgreen") +
    xlab(col)
  myplots[[col]] <- p1  # add each plot into plot list
}

multiplot(plotlist = myplots, cols = 4)
#> Loading required package: grid

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.