1

I have multiple files to plot as volcano plot. All my files in the folder.

Objective I would like to read them as list of files and then pass them into the function to plot for each data or files.

The function which I would like to use is this

EnhancedVolcano(res1,lab = rownames(res1),x = "log2FoldChange",y = "padj",
                #selectLab = c("APOBEC3B","CHD7","AURKB","EYA1","UHRF1","SFMBT1"),
                xlim = c(-8, 8),
                xlab = bquote(~Log[2]~ "fold change"),
              ylab = bquote(~-Log[10]~adjusted~italic(P)),
                transcriptPointSize = 10,
                transcriptLabSize = 10,
              border = "full",
              pCutoff = 0.05,
              #legendPosition = "bottom",
              borderWidth = 1.5,
              legend=c('NS','Log2 FC','Adjusted p-value',
                       'Adjusted p-value & Log2 FC'),
              legendPosition = 'bottom',
              legendLabSize = 20,
              legendIconSize = 20,
              borderColour = "blue",
              #drawConnectors = FALSE,
              #widthConnectors = 0.01,
              colConnectors = 'grey30',
              gridlines.major = FALSE,
              gridlines.minor = FALSE)

The is the list of files which I intend to use

M0_vs_M1_TCGA_stages.txt  M0_vs_M4_TCGA_stages.txt  M1_vs_M3_TCGA_stages.txt  M2_vs_M3_TCGA_stages.txt  M3_vs_M4_TCGA_stages.txt
M0_vs_M2_TCGA_stages.txt  M0_vs_M5_TCGA_stages.txt  M1_vs_M4_TCGA_stages.txt  M2_vs_M4_TCGA_stages.txt  M3_vs_M5_TCGA_stages.txt
M0_vs_M3_TCGA_stages.txt  M1_vs_M2_TCGA_stages.txt  M1_vs_M5_TCGA_stages.txt  M2_vs_M5_TCGA_stages.txt  M4_vs_M5_TCGA_stages.txt

The general structure of each of my dataframe is like this

a <- dput(head(M0_vs_M1_TCGA_stages))
structure(list(gene = c("ENSG00000000003", "ENSG00000000971", 
"ENSG00000002726", "ENSG00000003989", "ENSG00000005381", "ENSG00000006534"
), Symbol = c("TSPAN6", "CFH", "AOC1", "SLC7A2", "MPO", "ALDH3B1"
), baseMean = c(18.692748982067, 464.265236194545, 109.22179823167, 
85.504528879087, 225281.306485184, 3135.38237206618), log2FoldChange = c(1.72011856334064, 
-1.84102137729838, -1.90294968540377, -2.38723703218791, -4.71693379158602, 
-1.50626419101949), lfcSE = c(0.521825206121688, 0.528072294508922, 
0.539428712863011, 0.661673608593429, 0.523148071429431, 0.26205630469554
), stat = c(3.29635008650678, -3.48630556164743, -3.52771300456717, 
-3.60787705778782, -9.0164411362497, -5.74786472994606), pvalue = c(0.00097949874464195, 
0.00048974125782849, 0.00041916635977159, 0.00030871270363637, 
1.94298755192739e-19, 9.03774951656819e-09), padj = c(0.0133044251543343, 
0.00833058768185816, 0.00750903801425802, 0.00609902023132708, 
3.7330619835181e-15, 3.94641548776874e-06), UP_DOWN = c("UP", 
"DOWN", "DOWN", "DOWN", "Low", "DOWN")), row.names = c(NA, -6L
), class = c("tbl_df", "tbl", "data.frame"))

So for each file or each dataset I would like to pass them to the above function and print them as individual plot and retain the name of the in the plot except.

Any suggestion or help I would really appreciate.

My attempt so far

 make_volcano <- function(df){
      ggmaplot(df, main = expression("Group 1" %->% "Group 2"),
               fdr = 0.05, fc = 1, size = 0.4,
               palette = c("#B31B21", "#1465AC", "darkgray"),
               genenames = as.vector(df$Symbol),
               legend = "top", top = 0,
               font.label = c("bold", 11),
               font.legend = "bold",
               font.main = "bold",
               ggtheme = ggplot2::theme_minimal())
    }
    
    plots <- lapply(all_csv, make_volcano)

This does what i need it was not so complicate i need to figure out how to save the plot with respective file name

Improved version of my answer

bb <- all_csv


plot_list = list()
for (i in seq(length(bb))) {
  p = make_volcano(bb[[i]])
  plot_list[[i]] = p
}


pdf("MAPLOT.pdf",height = 10,width = 15)

for (i in seq(length(bb))) {
  print(plot_list[[i]])
}
dev.off()

Only thing I need to add put each list element name into the plot in order to identify although they are being plotted in order

9

1 Answer 1

1

I am not on my computer and don't have R available, thus this answer is more general and should just give an idea of the principle.

You seem to have solved the problem to read in the list of files and already have the list of data sets. And you have your plotting function. Well done.

I personally prefer the "apply" family for looping, because it is slightly shorter code, I find it easier to read, and also comes with less (i.e., no) danger of "growing your vectors". (see also Burn's famous R inferno, chapter 2).

in your case, you could therefore simply write

## lapply returns a list
lapply(all_csv, make_volcano)

Which will create the list of plots. You have now several options to save them. You could print them on one plot, easiest with the patchwork package:

plots <- lapply(all_csv, make_volcano)
patchwork::wrap_plots(plots)

If you want to create separate files, your approach is perfectly fine. Another option might be to use ggsave, here again with lapply. You can specify arguments in lapply itself.

lapply(plots, ggsave, width = 15, device = "pdf")

Naming is a bit trickier and certainly depends largely on the structure of your data set list. Is it a named list? What do you get when calling names(all_csv)?

You can use the names for the titles, as shown in this thread. This is also not the only thread on that topic, it is actually a farily common problem here on stackoverflow. The general idea is to loop over both list and names and assign the respective name to the plot - this can be achieved via indexing or with the use of parallel looping functions such as mapply or purrr::map2. I generally like looping over indexes for those cases. You could for example do:

lapply(1:length(all_csv), function(i){
make_volcano(all_csv[[i]] +
## I am here assuming that ggmaplot returns a ggplot object to which you can add a
## ggtitle layer - not sure if this really works. But hopefully you get the idea
ggtitle(names(all_csv)[i])
})

The same idea of looping over indexes of your names should work with ggsave, and you will get filenames that are like the read-in data files.

lapply(1: length(plots), function(i){
ggsave(plot = plots[[i]], 
       filename = paste(names(plots)[i], ".pdf"), 
        width = 15)
})
Sign up to request clarification or add additional context in comments.

3 Comments

thank you for the elaborate answer now I can automate lots of stuff for plotting which i have lots figures to make in one go.
This i get when I try names(all_csv) ` names(all_csv) [1] "M0_vs_M1_TCGA_stages" "M0_vs_M2_TCGA_stages" "M0_vs_M3_TCGA_stages" "M0_vs_M4_TCGA_stages" "M0_vs_M5_TCGA_stages" "M1_vs_M2_TCGA_stages" [7] "M1_vs_M3_TCGA_stages" "M1_vs_M4_TCGA_stages" "M1_vs_M5_TCGA_stages" "M2_vs_M3_TCGA_stages" "M2_vs_M4_TCGA_stages" "M2_vs_M5_TCGA_stages" [13] "M3_vs_M4_TCGA_stages" "M3_vs_M5_TCGA_stages" "M4_vs_M5_TCGA_stages"`
lapply(1: length(plots), function(i){ ggsave(plot = plots[[i]], filename = paste(names(plots)[i], ".pdf"), width = 15) }) this one is better or I would say best since its saves separate file which with their name into it..now I dont have to worry about putting title on each plot which I thought of doing when I was printing all of them into a single pdf.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.