Storing ggplot objects in a list from within loop in R

Question

My problem is similar to this one; when I generate plot objects (in this case histograms) in a loop, seems that all of them become overwritten by the most recent plot.

To debug, within the loop, I am printing the index and the generated plot, both of which appear correctly. But when I look at the plots stored in the list, they are all identical except for the label.

(I'm using multiplot to make a composite image, but you get same outcome if you print (myplots[[1]]) through print(myplots[[4]]) one at a time.)

Because I already have an attached dataframe (unlike the poster of the similar problem), I am not sure how to solve the problem.

(btw, column classes are factor in the original dataset I am approximating here, but same problem occurs if they are integer)

Here is a reproducible example:

library(ggplot2)
source("http://peterhaschke.com/Code/multiplot.R") #load multiplot function

#make sample data
col1 <- c(2, 4, 1, 2, 5, 1, 2, 0, 1, 4, 4, 3, 5, 2, 4, 3, 3, 6, 5, 3, 6, 4, 3, 4, 4, 3, 4, 
          2, 4, 3, 3, 5, 3, 5, 5, 0, 0, 3, 3, 6, 5, 4, 4, 1, 3, 3, 2, 0, 5, 3, 6, 6, 2, 3, 
          3, 1, 5, 3, 4, 6)
col2 <- c(2, 4, 4, 0, 4, 4, 4, 4, 1, 4, 4, 3, 5, 0, 4, 5, 3, 6, 5, 3, 6, 4, 4, 2, 4, 4, 4, 
          1, 1, 2, 2, 3, 3, 5, 0, 3, 4, 2, 4, 5, 5, 4, 4, 2, 3, 5, 2, 6, 5, 2, 4, 6, 3, 3, 
          3, 1, 4, 3, 5, 4)
col3 <- c(2, 5, 4, 1, 4, 2, 3, 0, 1, 3, 4, 2, 5, 1, 4, 3, 4, 6, 3, 4, 6, 4, 1, 3, 5, 4, 3, 
          2, 1, 3, 2, 2, 2, 4, 0, 1, 4, 4, 3, 5, 3, 2, 5, 2, 3, 3, 4, 2, 4, 2, 4, 5, 1, 3, 
          3, 3, 4, 3, 5, 4)
col4 <- c(2, 5, 2, 1, 4, 1, 3, 4, 1, 3, 5, 2, 4, 3, 5, 3, 4, 6, 3, 4, 6, 4, 3, 2, 5, 5, 4,
          2, 3, 2, 2, 3, 3, 4, 0, 1, 4, 3, 3, 5, 4, 4, 4, 3, 3, 5, 4, 3, 5, 3, 6, 6, 4, 2, 
          3, 3, 4, 4, 4, 6)
data2 <- data.frame(col1,col2,col3,col4)
data2[,1:4] <- lapply(data2[,1:4], as.factor)
colnames(data2)<- c("A","B","C", "D")

#generate plots
myplots <- list()  # new empty list
for (i in 1:4) {
  p1 <- ggplot(data=data.frame(data2),aes(x=data2[ ,i]))+ 
    geom_histogram(fill="lightgreen") +
    xlab(colnames(data2)[ i])
  print(i)
  print(p1)
  myplots[[i]] <- p1  # add each plot into plot list
}
multiplot(plotlist = myplots, cols = 4)

When I look at a summary of a plot object in the plot list, this is what I see

> summary(myplots[[1]])
data: A, B, C, D [60x4]
mapping:  x = data2[, i]
faceting: facet_null() 
-----------------------------------
geom_histogram: fill = lightgreen 
stat_bin:  
position_stack: (width = NULL, height = NULL)

I think that mapping: x = data2[, i] is the problem, but I am stumped! I can't post images, so you'll need to run my example and look at the graphs if my explanation of the problem is confusing.

Thanks!

link to multiplot is dead

baxx
– baxx

2019-10-08 11:49:12 +00:00
Commented Oct 8, 2019 at 11:49 — baxx
– baxx, Commented Oct 8, 2019 at 11:49
The link works for me. I added a post with the graphs.

Emy
– Emy

2021-04-09 14:46:17 +00:00
Commented Apr 9, 2021 at 14:46 — Emy
– Emy, Commented Apr 9, 2021 at 14:46

Konrad Rudolph · Accepted Answer · 2023-11-07 22:50:04Z

99

In addition to the other excellent answer, here’s a solution that uses “normal”-looking evaluation rather than eval. Since for loops have no separate variable scope (i.e. they are performed in the current environment) we need to use local to wrap the for block; in addition, we need to make i a local variable — which we can do by re-assigning it to its own name¹:

myplots <- vector('list', ncol(data2))

for (i in seq_along(data2)) {
    message(i)
    myplots[[i]] <- local({
        i <- i
        ggplot(data2, aes(x = data2[[i]])) +
            geom_histogram(fill = "lightgreen") +
            xlab(colnames(data2)[i])
    })
}

However, an altogether cleaner way is to forego the for loop entirely and use list functions to build the result. This works in several possible ways. The following is the easiest in my opinion:

plot_data_column = function (data, column) {
    ggplot(data, aes_string(x = column)) +
        geom_histogram(fill = "lightgreen") +
        xlab(column)
}

myplots <- lapply(colnames(data2), plot_data_column, data = data2)

This has several advantages: it’s simpler, and it won’t clutter the environment (with the loop variable i).

¹ This might seem confusing: why does i <- i have any effect at all? — Because by performing the assignment we create a new, local variable with the same name as the variable in the outer scope. We could equally have used a different name, e.g. local_i <- i.

edited Nov 7, 2023 at 22:50

answered Aug 13, 2015 at 17:12

Konrad Rudolph

549k142 gold badges967 silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

9 Comments

LizPS Over a year ago

Thank you so much, especially for the lapply version; I wanted to functionalize this but couldn't figure it out, and decided to do (superficially easier, actually horrible) for loop. I figured it was a variable scope problem, I am often fighting them in R!

BigTimeStats Over a year ago

Both these solutions are unwieldy. For some reason, myplots burgeons to GB's per iteration in my environment. Using both the local method or function/lapply method.

Konrad Rudolph Over a year ago

@BigTimeStats Well that’s an issue with having many very big plots, not with either of these solutions. A common solution is to subsample the number of data points you plot (often, such big plots won’t reliably display all individual data points anyway), or to compute summary statistics ahead of plotting (and plot these rather than the raw data). But sometimes neither works. In that case, the only solution is to avoid having multiple plots in memory at once.

Konrad Rudolph Over a year ago

@BigTimeStats The estimate in the environment pane is notoriously unreliable. A large part of the reason is that it estimates each object’s size individually but lots of objects in R (particularly data frames) share memory: if you create one data frame from another by modifying one column, then they will share the memory for all remaining columns.

Konrad Rudolph Over a year ago

@M-- Actually neither really has any place there. I left the print() in from OP’s code (OP seems to want to display the current plot in each loop iteration!), but I think it’s misplaced here (and so is invisible(), which has no effect here).

|

Rorschach · Accepted Answer · 2015-08-13 16:48:59Z

23

Because of all the quoting of expressions that get passed around, the i that is evaluated at the end of the loop is whatever i happens to be at that time, which is its final value. You can get around this by eval(substitute(ing in the right value during each iteration.

myplots <- list()  # new empty list
for (i in 1:4) {
    p1 <- eval(substitute(
        ggplot(data=data.frame(data2),aes(x=data2[ ,i]))+ 
          geom_histogram(fill="lightgreen") +
          xlab(colnames(data2)[ i])
    ,list(i = i)))
    print(i)
    print(p1)
    myplots[[i]] <- p1  # add each plot into plot list
}
multiplot(plotlist = myplots, cols = 4)

answered Aug 13, 2015 at 16:48

Rorschach

32.7k5 gold badges87 silver badges135 bronze badges

6 Comments

Konrad Rudolph Over a year ago

The diagnosis is correct but the solution is somewhat convoluted. It’s easier to capture i in a local context. The problem is that for loops in R have no scope so you need to use local instead: for (i in 1:4) local({i = i; … rest of the loop … }). The self-assignment i = i isn’t by accident — this is actually needed. A different variable name can also be used. Regardless, all this would be unnecessary by using “proper” list functions instead of for, which is frankly a bad language construct in R.

Rorschach Over a year ago

@KonradRudolph local is nice

Konrad Rudolph Over a year ago

Ah, I forgot something: if local is used, the assignment to myplots[[i]] needs to use the <<- operator instead of local assignment.

Rorschach Over a year ago

@KonradRudolph any chance you want to add a solution using one of the apply functions. It seems, in that case a substitution or local would also be required? Also, is there a reason that local is better than the substitute way?

Konrad Rudolph Over a year ago

I prefer local because it looks like it’s performing standard evaluation (although that’s not the case of course). it hides the evals and substitutes away. In fact neither lapply nor for really needs to capture the variable i if column names are used in the aesthetics. I’ll add an answer.

|

Emy · Accepted Answer · 2021-04-09 14:56:17Z

I have run the code in the question and in the answer, changing geom_histogram to geom_bar to avoid the error: Error: StatBin requires a continuous x variable.

Here is the code with the visualizations:

Question

#generate plots
myplots <- list()  # new empty list
for (i in 1:4) {
  p1 <- ggplot(data=data.frame(data2),aes(x=data2[ ,i]))+ 
    geom_bar(fill="lightgreen") +
    xlab(colnames(data2)[ i])
  print(i)
  print(p1)
  myplots[[i]] <- p1  # add each plot into plot list
}

multiplot(plotlist = myplots, cols = 4)
#> Loading required package: grid

Answer

myplots <- vector('list', ncol(data2))

for (i in seq_along(data2)) {
    message(i)
    myplots[[i]] <- local({
        i <- i
        p1 <- ggplot(data2, aes(x = data2[[i]])) +
            geom_bar(fill = "lightgreen") +
            xlab(colnames(data2)[i])
        print(p1)
    })
}

multiplot(plotlist = myplots, cols = 4)

Same result using lapply:


plot_data_column = function (data, column) {
    ggplot(data, aes_string(x = column)) +
        geom_bar(fill = "lightgreen") +
        xlab(column)
}

myplots <- lapply(colnames(data2), plot_data_column, data = data2)

multiplot(plotlist = myplots, cols = 4)
#> Loading required package: grid

^{Created on 2021-04-09 by the reprex package (v0.3.0)}

Paul van Oppen · Accepted Answer · 2020-08-19 06:40:42Z

1

Using lapply works too as x exists within the anonymous function environment (using mtcars as data):

plot <- lapply(seq_len(ncol(mtcars)), FUN = function(x) {
  ggplot(data = mtcars) + 
    geom_line(aes(x = mpg, y = mtcars[ , x]), size = 1.4, color = "midnightblue", inherit.aes = FALSE) +
    labs(x="Date", y="Value", title = "Revisions 1M", subtitle = colnames(mtcars)[x]) +
    theme_wsj() +
    scale_colour_wsj("colors6")
})

answered Aug 19, 2020 at 6:40

Paul van Oppen

1,5051 gold badge10 silver badges22 bronze badges

Comments

Avish · Accepted Answer · 2023-05-10 09:38:32Z

0

Here is another solution:

#generate plots
myplots <- list()  # new empty list
for (col in colnames(data2)) {
  p1 <- ggplot(data=data.frame(data2),aes(x=!!ensym(col)))+ 
    geom_bar(fill="lightgreen") +
    xlab(col)
  myplots[[col]] <- p1  # add each plot into plot list
}

multiplot(plotlist = myplots, cols = 4)
#> Loading required package: grid

answered May 10, 2023 at 9:38

Avish

1061 gold badge1 silver badge6 bronze badges

Collectives™ on Stack Overflow

Storing ggplot objects in a list from within loop in R

5 Answers 5

9 Comments

6 Comments

Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

9 Comments

6 Comments

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related