ggplot2 generate same plot for different variables in a for loop

Question

I am trying to loop through a list containing numbers only. For each loop, I convert the column from char to numeric, and then I attempt to plot it. A basic example of my code is:

library(ggtree) 
library(treeio)
library(tidyverse)
library(ggnewscale)
library(ggtreeExtra)
library(argparse)
library(RColorBrewer)
library(rlist)
library(stringr)

tree <- read.tree("/...") #PLEASE REPLACE THIS WITH THE LOCATION TO 'tree_newick.nwk'

tipcategories = read.csv("....", # PLEASE REPLACE THIS WITH THE LOCATION TO 'plot.tsv'
                     sep = " ",
                     header = TRUE,
                     stringsAsFactors = FALSE)

dd = as.data.frame(tipcategories)

p <- ggtree(tree) + ylim(-1, NA) + theme_tree2() 

p <- p %<+% dd + geom_tiplab(size=1)   

n <- 60
qual_col_pals = brewer.pal.info[brewer.pal.info$category == 'qual',]
col_vector = unlist(mapply(brewer.pal, qual_col_pals$maxcolors, 
rownames(qual_col_pals)))

columns = c("Column1", "Column2")

for (col in columns) {

  p <- p + new_scale_fill()

  dd[[col]] <- as.numeric(as.character(dd[[col]]))

  p <- p + geom_fruit(geom=geom_tile, mapping=aes(fill=dd[[col]]), width=2, offset=0.05) +
    scale_fill_continuous(name=col, low='blue', high='red')

}

p <- p + theme(legend.text = element_text(size = 5), legend.key.size = unit(0.3, 'cm'))

ggsave("....") # PLEASE REPLACE THIS WITH WHERE YOU WANT TO SAVE IT

The tree data is (please put in file and replace filename with dots in read tree):

(((((((A:4,B:4):6,C:5):8,D:6):3,E:21):10,((F:4,G:12):14,H:8):13):13,((I:5,J:2):30,(K:11,L:11):2):17):4,M:56);

The metadata file (please put in file and replace filename with dots in read.csv):

Accession1 Column1 Column2   
A 10 130
B 20 120
C 30 110 
D 40 100
E 50 90
F 60 80 
G 70 70
H 80 60
I 90 50
J 100 40
K 110 30
L 120 20
M 130 10

The above works fine for just one columns, however, when trying to plot 2 columns, the second column always overwrites the first column, and the first column ends up looking exactly the same as the second column. The below image shows the result of running the program normally.

The first column (column1) is actually supposed to look like this:

Could anyone provide help as to how to fix this?

It’s very hard to help without sample data to test with. See stackoverflow.com/questions/5963269/… — MrFlick
– MrFlick, Commented Apr 14, 2021 at 5:37
Hi, I apologise for that as I didn't know what you mean't. I hope it is now reproducible (please notify me otherwise). — Yasir
– Yasir, Commented Apr 14, 2021 at 6:54
You can dput the data for easier to copy paste into R script ;) — Sinh Nguyen
– Sinh Nguyen, Commented Apr 14, 2021 at 7:08

Sinh Nguyen · Accepted Answer · 2021-04-15 03:16:10Z

1

It really take a lot of time to reproduce your case as you have so many packages that I didn't use :)

Explaination of the issue: ggplot does not render any graph at the time you call the geom and passing data and mapping aes. ggplot just store the name reference to the data variable. Only when render it actually get the value and plot. In your case, you are passing reference dd[[col]] and as col change value through for loop while ggplot always reference to col so it ended render two bar of the same data of the last column value is Column2. You can verify this by changing order of the column and put Column1 at last then you will see two bar of Column1 instead.

Solution: create unique reference for each loop

Initial setup with data in dput format

library(ggtree)
library(treeio)
library(tidyverse)
library(ggnewscale)
library(ggtreeExtra)
library(argparse)
library(RColorBrewer)
library(rlist)
library(stringr)

tree <- structure(list(edge = structure(c(14L, 15L, 16L, 17L, 18L, 19L, 
  20L, 20L, 19L, 18L, 17L, 16L, 21L, 22L, 22L, 21L, 15L, 23L, 24L, 
  24L, 23L, 25L, 25L, 14L, 15L, 16L, 17L, 18L, 19L, 20L, 1L, 2L, 
  3L, 4L, 5L, 21L, 22L, 6L, 7L, 8L, 23L, 24L, 9L, 10L, 25L, 11L, 
  12L, 13L), .Dim = c(24L, 2L)), edge.length = c(4, 13, 10, 3, 
    8, 6, 4, 4, 5, 6, 21, 13, 14, 4, 12, 8, 17, 30, 5, 2, 2, 11, 
    11, 56), Nnode = 12L, tip.label = c("A", "B", "C", "D", "E", 
      "F", "G", "H", "I", "J", "K", "L", "M")), class = "phylo",
  order = "cladewise")

tipcategories <- structure(
  list(Accession1 = c("A", "B", "C", "D", "E", "F", "G", 
    "H", "I", "J", "K", "L", "M"), Column1 = c(10L, 20L, 30L, 40L, 
      50L, 60L, 70L, 80L, 90L, 100L, 110L, 120L, 130L), Column2 = c(130L, 
        120L, 110L, 100L, 90L, 80L, 70L, 60L, 50L, 40L, 30L, 20L, 10L
      ), X = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), 
    X.1 = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA
    ), X.2 = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
      NA)), class = "data.frame", row.names = c(NA, -13L))

Your code with plot generation and modification to avoid using same variable for the plot which cause the issue you got in OP

dd <- as.data.frame(tipcategories)

p <- ggtree(tree) + ylim(-1, NA) + theme_tree2()

p <- p %<+% dd + geom_tiplab(size = 1)

n <- 60
qual_col_pals <- brewer.pal.info[brewer.pal.info$category == "qual", ]
col_vector <- unlist(mapply(
  brewer.pal, qual_col_pals$maxcolors,
  rownames(qual_col_pals)
))

columns <- c("Column1", "Column2")

for (col in columns) {
  p <- p + new_scale_fill()

  # assign the value of dd[[col]] into a new variable using the name column
  assign(col, as.numeric(as.character(dd[[col]])))
  
  # using bang bang (!!) & sym to reference the variable inside ggplot call
  # this allow the ggplot to reference to different variable when finally render
  # plot at the end
  p <- p + geom_fruit(geom = geom_tile, mapping = aes(fill = !!sym(col)),
    width = 2, offset = 0.05) +
    scale_fill_continuous(name = col, low = "blue", high = "red")
}

p <- p + theme(legend.text = element_text(size = 5),
  legend.key.size = unit(0.3, "cm"))

p

^{Created on 2021-04-15 by the reprex package (v2.0.0)}

answered Apr 15, 2021 at 3:16

Sinh Nguyen

4,5423 gold badges22 silver badges26 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

Sinh Nguyen Over a year ago

In one ggplot you can only have one type per aes so if you already have a continous fill you cannot have a discrete fill on that same plot. If you ask another question with more detail and what you want to achieve, it would be easier to discuss what can be done.

Yasir Over a year ago

In the past I have had success with manually plotting different types of scales (using new_scale_fill()). I am unsure as to why it is not working now. Regardless, thank you so much for your reply, the information you have provided is very useful.

Sinh Nguyen Over a year ago

I haven't use much of ggnewscale package before encouter your question. I think it may workout. Then you can just try to switch between scale_fill_continous and scale_fill_discrete or scale_fill_manual base on the current col

Yasir Over a year ago

Sorry, but what if there is a letter in the column. I tried to convert letters into numbers before assign(col...,), but the error that comes up was "Discrete value supplied to continuous scale". I know this is due to the letters. How do I fix this issue?

Sinh Nguyen Over a year ago

Without the data it really difficult to confirm if the propose could work. As the OP is about graphing which already answered, I think it would best for you to ask a new question with narrow down to specific challenge you are having so other can support you better.

|

Collectives™ on Stack Overflow

ggplot2 generate same plot for different variables in a for loop

1 Answer 1

Initial setup with data in dput format

6 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Initial setup with data in dput format

6 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related