Reading multiple data files and passing it into a function to plot

Question

I have multiple files to plot as volcano plot. All my files in the folder.

Objective I would like to read them as list of files and then pass them into the function to plot for each data or files.

The function which I would like to use is this

EnhancedVolcano(res1,lab = rownames(res1),x = "log2FoldChange",y = "padj",
                #selectLab = c("APOBEC3B","CHD7","AURKB","EYA1","UHRF1","SFMBT1"),
                xlim = c(-8, 8),
                xlab = bquote(~Log[2]~ "fold change"),
              ylab = bquote(~-Log[10]~adjusted~italic(P)),
                transcriptPointSize = 10,
                transcriptLabSize = 10,
              border = "full",
              pCutoff = 0.05,
              #legendPosition = "bottom",
              borderWidth = 1.5,
              legend=c('NS','Log2 FC','Adjusted p-value',
                       'Adjusted p-value & Log2 FC'),
              legendPosition = 'bottom',
              legendLabSize = 20,
              legendIconSize = 20,
              borderColour = "blue",
              #drawConnectors = FALSE,
              #widthConnectors = 0.01,
              colConnectors = 'grey30',
              gridlines.major = FALSE,
              gridlines.minor = FALSE)

The is the list of files which I intend to use

M0_vs_M1_TCGA_stages.txt  M0_vs_M4_TCGA_stages.txt  M1_vs_M3_TCGA_stages.txt  M2_vs_M3_TCGA_stages.txt  M3_vs_M4_TCGA_stages.txt
M0_vs_M2_TCGA_stages.txt  M0_vs_M5_TCGA_stages.txt  M1_vs_M4_TCGA_stages.txt  M2_vs_M4_TCGA_stages.txt  M3_vs_M5_TCGA_stages.txt
M0_vs_M3_TCGA_stages.txt  M1_vs_M2_TCGA_stages.txt  M1_vs_M5_TCGA_stages.txt  M2_vs_M5_TCGA_stages.txt  M4_vs_M5_TCGA_stages.txt

The general structure of each of my dataframe is like this

a <- dput(head(M0_vs_M1_TCGA_stages))
structure(list(gene = c("ENSG00000000003", "ENSG00000000971", 
"ENSG00000002726", "ENSG00000003989", "ENSG00000005381", "ENSG00000006534"
), Symbol = c("TSPAN6", "CFH", "AOC1", "SLC7A2", "MPO", "ALDH3B1"
), baseMean = c(18.692748982067, 464.265236194545, 109.22179823167, 
85.504528879087, 225281.306485184, 3135.38237206618), log2FoldChange = c(1.72011856334064, 
-1.84102137729838, -1.90294968540377, -2.38723703218791, -4.71693379158602, 
-1.50626419101949), lfcSE = c(0.521825206121688, 0.528072294508922, 
0.539428712863011, 0.661673608593429, 0.523148071429431, 0.26205630469554
), stat = c(3.29635008650678, -3.48630556164743, -3.52771300456717, 
-3.60787705778782, -9.0164411362497, -5.74786472994606), pvalue = c(0.00097949874464195, 
0.00048974125782849, 0.00041916635977159, 0.00030871270363637, 
1.94298755192739e-19, 9.03774951656819e-09), padj = c(0.0133044251543343, 
0.00833058768185816, 0.00750903801425802, 0.00609902023132708, 
3.7330619835181e-15, 3.94641548776874e-06), UP_DOWN = c("UP", 
"DOWN", "DOWN", "DOWN", "Low", "DOWN")), row.names = c(NA, -6L
), class = c("tbl_df", "tbl", "data.frame"))

So for each file or each dataset I would like to pass them to the above function and print them as individual plot and retain the name of the in the plot except.

Any suggestion or help I would really appreciate.

My attempt so far

 make_volcano <- function(df){
      ggmaplot(df, main = expression("Group 1" %->% "Group 2"),
               fdr = 0.05, fc = 1, size = 0.4,
               palette = c("#B31B21", "#1465AC", "darkgray"),
               genenames = as.vector(df$Symbol),
               legend = "top", top = 0,
               font.label = c("bold", 11),
               font.legend = "bold",
               font.main = "bold",
               ggtheme = ggplot2::theme_minimal())
    }
    
    plots <- lapply(all_csv, make_volcano)

This does what i need it was not so complicate i need to figure out how to save the plot with respective file name

Improved version of my answer

bb <- all_csv


plot_list = list()
for (i in seq(length(bb))) {
  p = make_volcano(bb[[i]])
  plot_list[[i]] = p
}


pdf("MAPLOT.pdf",height = 10,width = 15)

for (i in seq(length(bb))) {
  print(plot_list[[i]])
}
dev.off()

Only thing I need to add put each list element name into the plot in order to identify although they are being plotted in order

very related, if not duplicate stackoverflow.com/questions/9564489/… — tjebo
– tjebo, Commented Jun 7, 2022 at 6:44
i saw that post but I was not sure about the passing the list of files to my above function and print them as different plots — PesKchan
– PesKchan, Commented Jun 7, 2022 at 6:48
stackoverflow.com/questions/66038622/… stackoverflow.com/questions/67647284/… stackoverflow.com/questions/64632681/… stackoverflow.com/questions/62457314/… — tjebo
– tjebo, Commented Jun 7, 2022 at 6:51
Sometimes one is stuck. I've removed the downvote. You'll learn much more trying to come to the solution yourself. However, if those threads don't help, please give us a shout. — tjebo
– tjebo, Commented Jun 7, 2022 at 6:54

tjebo · Accepted Answer · 2022-06-07 12:11:01Z

1

I am not on my computer and don't have R available, thus this answer is more general and should just give an idea of the principle.

You seem to have solved the problem to read in the list of files and already have the list of data sets. And you have your plotting function. Well done.

I personally prefer the "apply" family for looping, because it is slightly shorter code, I find it easier to read, and also comes with less (i.e., no) danger of "growing your vectors". (see also Burn's famous R inferno, chapter 2).

in your case, you could therefore simply write

## lapply returns a list
lapply(all_csv, make_volcano)

Which will create the list of plots. You have now several options to save them. You could print them on one plot, easiest with the patchwork package:

plots <- lapply(all_csv, make_volcano)
patchwork::wrap_plots(plots)

If you want to create separate files, your approach is perfectly fine. Another option might be to use ggsave, here again with lapply. You can specify arguments in lapply itself.

lapply(plots, ggsave, width = 15, device = "pdf")

Naming is a bit trickier and certainly depends largely on the structure of your data set list. Is it a named list? What do you get when calling names(all_csv)?

You can use the names for the titles, as shown in this thread. This is also not the only thread on that topic, it is actually a farily common problem here on stackoverflow. The general idea is to loop over both list and names and assign the respective name to the plot - this can be achieved via indexing or with the use of parallel looping functions such as mapply or purrr::map2. I generally like looping over indexes for those cases. You could for example do:

lapply(1:length(all_csv), function(i){
make_volcano(all_csv[[i]] +
## I am here assuming that ggmaplot returns a ggplot object to which you can add a
## ggtitle layer - not sure if this really works. But hopefully you get the idea
ggtitle(names(all_csv)[i])
})

The same idea of looping over indexes of your names should work with ggsave, and you will get filenames that are like the read-in data files.

lapply(1: length(plots), function(i){
ggsave(plot = plots[[i]], 
       filename = paste(names(plots)[i], ".pdf"), 
        width = 15)
})

edited Jun 7, 2022 at 12:11

answered Jun 7, 2022 at 12:05

tjebo

24.1k8 gold badges73 silver badges108 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

PesKchan Over a year ago

thank you for the elaborate answer now I can automate lots of stuff for plotting which i have lots figures to make in one go.

PesKchan Over a year ago

This i get when I try names(all_csv) ` names(all_csv) [1] "M0_vs_M1_TCGA_stages" "M0_vs_M2_TCGA_stages" "M0_vs_M3_TCGA_stages" "M0_vs_M4_TCGA_stages" "M0_vs_M5_TCGA_stages" "M1_vs_M2_TCGA_stages" [7] "M1_vs_M3_TCGA_stages" "M1_vs_M4_TCGA_stages" "M1_vs_M5_TCGA_stages" "M2_vs_M3_TCGA_stages" "M2_vs_M4_TCGA_stages" "M2_vs_M5_TCGA_stages" [13] "M3_vs_M4_TCGA_stages" "M3_vs_M5_TCGA_stages" "M4_vs_M5_TCGA_stages"`

PesKchan Over a year ago

lapply(1: length(plots), function(i){ ggsave(plot = plots[[i]],         filename = paste(names(plots)[i], ".pdf"),          width = 15) })

this one is better or I would say best since its saves separate file which with their name into it..now I dont have to worry about putting title on each plot which I thought of doing when I was printing all of them into a single pdf.

Collectives™ on Stack Overflow

Reading multiple data files and passing it into a function to plot

1 Answer 1

3 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related