1

My goal is to plot several dataframes (they all have the same structure) using ggplot2. I need to read a csv file so I get a single dataframe then I split it which gives me a list with my datframes.

Dataframe_A <- read.csv("mycsv.csv")
Dataframe_A_split <- split.data.frame(Dataframe_A, list(Dataframe_A$V1,Dataframe_A$V2), drop=TRUE)

Dataframe_A <- data.frame(y1 = c(1, 2, 3,4,5,6,7,9,0,1), y2 = c(1, 3, 3,4,7,6,14,9,7,1), y3 =c("Yes","No","No","Yes","No","No","Yes","No","No","No"), y4=c("A","A","B","A","A","B","A","A","B","A"))
Dataframe_A_split<-split.data.frame(Dataframe_A, list(Dataframe_A$y3, Dataframe_A$y4), drop=TRUE)

$No.A
   y1 y2 y3 y4
2   2  3 No  A
5   5  7 No  A
8   9  9 No  A
10  1  1 No  A

$Yes.A
  y1 y2  y3 y4
1  1  1 Yes  A
4  4  4 Yes  A
7  7 14 Yes  A

$No.B
  y1 y2 y3 y4
3  3  3 No  B
6  6  6 No  B
9  0  7 No  B

I know I can use Dataframe_A_split[[1]] to get to the first dataframe but I have twenty dataframe in my list and using ggplot (to do a scatter plot for example) to loop through my list would be useful and easier to read. In my example I would end up with three graphs.

5
  • 4
    Wouldn't it be easier to keep it as one data frame and just facet on y3 and y4? Commented Mar 6, 2019 at 16:50
  • +1 you're definitely going at it wrong by splitting your large data.frame into 20 small data.frames. If you need to operate within groups (e.g. compute group averages etc), better to still keep it in one df and use group_by and mutate or summarise from dplyr. If you need to plot the groups separately, use the colour or fill aesthetic to give your groups different colours or facet_* to physically separate the plots. Commented Mar 6, 2019 at 17:21
  • If you must produce different plots (not faceted) ggplot2, then you can use dplyr::group_by and dplyr::do to plot them separately. This assumes that you are saving them elsewhere or rendering them iteratively within a document, and not hoping to look at them incrementally/interactively. Commented Mar 6, 2019 at 17:24
  • @antoine-sac I wouldn't say they're definitely going about it wrong, especially if they're splitting one data frame into a list of data frames. Splitting it to have 20 data frames in their environment would probably be a bad idea. In my work, I often create a list of plots from a list of data frames when facetting would be inappropriate, e.g. the same type of plot but where each location is getting a standalone output. Commented Mar 6, 2019 at 18:31
  • @camille Yes you're right, I assumed the use case did not actually require splitting but there are indeed valid use cases for splitting. And the purrr package will handle that nicely! Although in that case I would probably use nest. Commented Mar 7, 2019 at 9:43

2 Answers 2

1

What you want to do is actually probably:

ggplot(Dataframe_A) +
  geom_point(aes(x = y1, y = y2)) +
  facet_grid(y3 ~ y4)

enter image description here

Consider using an aesthetic to avoid having too many plots.

ggplot(Dataframe_A) +
  geom_point(aes(x = y1, y = y2, colour = y3)) +
  facet_wrap(~y4)

enter image description here

The possibilities are endless:

ggplot(Dataframe_A) +
  geom_point(aes(x = y1, y = y2, colour = y3, shape = y4), size = 5)

enter image description here

Sign up to request clarification or add additional context in comments.

Comments

1

Like I said in my comment above, if there's a reason why you need to work on separate data frames, it's not wrong to go about it with a list of data frames. Just think about your intentions. I do this often when I need the same type of plot repeated for different groups that each need their own separate output. Facets are great for when you have reason to compare groups—think of them like small multiples.

You can use a function that works across a list to create the plots. I'm partial to the purrr::map_* family, but the base apply family works too. Using imap gives you access to the names created by splitting, so you can then identify the plots easily.

library(tidyverse)

plot_list <- Dataframe_A_split %>%
  imap(function(df, name) {
    ggplot(df, aes(x = y1, y = y2, color = y3)) +
      geom_point() +
      labs(title = name)
  })

plot_list$Yes.A

Created on 2019-03-06 by the reprex package (v0.2.1)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.