1

I have a very long data frame of results. There are 148 exposures and 148 outcomes and each has been regressed against the other (148*148 = 21,904 - the number of rows in the df).

I am wanting to plot the results for each exposure against the 148 outcomes - so I want 148 plots in total. The code below does this for one exposure and generates one plot.

**Question:**How best to do this for all 148 exposures and to export to a multi-page PDF and/OR separate PDF files?

# libraries 

library(qs)
library(dplyr)
library(ggplot2)
library(ggrepel)

# make data

set.seed(15)
res_df <- data.frame(exp = randomStrings(N = 148, string_size = 4))
res_df <- data.frame(res_df[rep(seq_len(nrow(res_df)), each = 148), ])
colnames(res_df)[1] <- "exp"
res_df <- mutate(res_df, y = randomStrings(N = 148, string_size = 5),
                 logp = abs(rnorm(n = 148, mean = 5, sd = 6)),
                 r = rnorm(n = 148, mean = 0.5, sd = 0.1))

# subset df for individiual plot

subset <- res_df[1,1]
res_df_a <- subset(res_df, exp == subset)

# PLOT

ggplot(res_df_a, aes(x = r, y = logp, label = y)) +
  geom_point(data = res_df_a[res_df_a$logp < 10,], color = "grey50") +
  geom_text_repel(data = res_df_a[res_df_a$logp > 10,], box.padding = 0.5, max.overlaps = Inf) +
  geom_point(data = res_df_a[res_df_a$logp > 10,], color = "red")+
  xlab("Variance explained (%)") + ylab("-log10(pvalue)") +
  ggtitle("y ~ exp")

2 Answers 2

1

Rather than use your example, I provide below a simpler example using a synthetic dataset. The key to a multi-page pdf is using the argument onefile = TRUE when opening the pdf device:

# required libraries ------------------------------------------------------
library(ggplot2)

# make data ---------------------------------------------------------------
set.seed(1)
df <- data.frame(x = cumsum(rnorm(10)), y = cumsum(rnorm(10)))

# make sequential plot and send output to pdf device ----------------------
pdf("plotseq.pdf", width = 5, height = 5, onefile = TRUE)
for(i in seq(nrow(df))){
  p <- ggplot(df) + aes(x = x, y = y) +
    geom_point(shape = 1) + 
    geom_point(data = df[i,]) + 
    labs(title=paste("i =", i))
  print(p)
}
dev.off()
Sign up to request clarification or add additional context in comments.

1 Comment

Thanks for replying. Using my example, this approach would give me 21,904 different plots as it is looping by row? I want to loop by exposure (148 exposures).
1

I managed to figure out the answer in the end by making a list of data frames from the one large df and then plot each df and save. Using the code above to make the data and then:

library(gridExtra)
# make list of data frames

obs_lists <- split( res_df , f = res_df$exp )

# plot each df within the list and write out to PDFs

p <- lapply(obs_lists, function(d) ggplot(
  data = d, aes(x = r, y = logp)) + geom_point()
)

# 6 per page
ggsave("multi.pdf", gridExtra::marrangeGrob(grobs = p, nrow=3, ncol=2, top = NULL))

# 1 per page 
pdf("single.pdf", onefile = TRUE)
p
dev.off()

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.