Filtering the x values while using ggplot

Question

I want to create boxplots comparing the analyte concentrations but grouping the samples on which donor they came from (D1 to D4), which virus they contained (VEH, HCV, or HIV) and whether or not they incubated with CO2 (+ or - CO2), ALL of which can be determined by the sample name. For example, the first sample, D1VEH+CO2 came from Donor 1, had the virus "VEH" (which technically isn't a virus but that's besides the point), and was incubated with CO2. I don't have to do all of these at once - I'll create a series of different boxplots. The thing I'm struggling with is isolating the different groups within the mappings. For example, see the command below:

ggplot(data = df, mapping = aes(x = AnalyteSample, y = A)) + geom_boxplot()

Now this gives me many boxplots of ALL the samples. What if I only want the boxplots of the samples containing the virus HIV? How do I filter the AnalyteSample column within a ggplot command?

structure(list(AnalyteSample = c("D1VEH+CO2", "D1HCV+CO2", "D1VEH-CO2", 
"D1HCV-CO2", "D2VEH+CO2", "D2HCV+CO2", "D2VEH-CO2", "D2HCV-CO2", 
"D3VEH+CO2", "D3HCV+CO2", "D3VEH-CO2", "D3HCV-CO2", "D4VEH+CO2", 
"D4VEH-CO2"), A = c("4190", "6665", "7435", "2052", "783", "322", 
"199", "90", "46", "17", "8", "3", "3", NA), B = c("11569", "6677", 
"3852", "983.88", "589", "359", "203", "68", "33", "12", "6", 
NA, "4", NA), C = c("20453", "7699", "2499", "707.98", "412", 
"328", "156", "88", "39", "27", "17", NA, NA, NA), D = c("7893", 
NA, "1623", "685.64", "321", "644", "112", "65", "35", "29", 
"9", "5", NA, NA), E = c("320", "15444", "2049", "1065", "389", 
"365", "145", "77", "38", "16", "9", "6", NA, NA), F = c("7438", 
NA, "3472", "1057", "563", "401", "167", "89", "46", "19", "6", 
NA, NA, NA), G = c(7345, 9001, 2473, 1138, 516, 403, 134, 81, 
37, 17, 8, 6, 4, 3), H = c("9004", "3998", "2299", "964.88", 
"499", "341", "112", "88", "39", "32", NA, NA, NA, NA), I = c("8434", 
"8700", "2217", "1263", "567", "352", "153", "80", "43", "18", 
"9", "2", "3", NA), J = c("7734", "6733", "2092", "1115", "637", 
"332", "155", "82", "37", "17", "10", "4", "1", NA), K = c(NA, 
NA, "2118", "862.13", "426", "355", "143", "78", "44", "22", 
"11", NA, NA, NA), L = c(6345, 7688, 2311, 1195, 647, 366, 177, 
83, 41, 20, 8, 6, 3, 2), M = c("4222", NA, "1846", "814.61", 
"422", "314", "154", "86", "41", "27", "21", NA, NA, NA), N = c("6773", 
"8934", "2381", "1221", "677", "356", "146", "89", "40", "17", 
"10", "5", "2", NA), O = c(NA, NA, NA, "564.5", "226", "476", 
"111", "60", "32", "36", "18", NA, NA, NA)), row.names = c(NA, 
-14L), class = "data.frame")

Allan Cameron · Accepted Answer · 2020-08-01 13:17:41Z

1

It's far easier if you separate your AnalyteSample column into its component parts. (Thanks to Tjebo for pointing out this is better than using substring.)

library(ggplot2)
library(dplyr)

df %>% tidyr::separate(AnalyteSample, c("Donor", "Virus", "CO2"), c(2, 5)) %>%
  ggplot(mapping = aes(x = Donor, y = as.numeric(A))) + 
  geom_boxplot() +
  facet_grid(.~CO2)

df %>% tidyr::separate(AnalyteSample, c("Donor", "Virus", "CO2"), c(2, 5)) %>%
  ggplot(mapping = aes(x = Donor, y = as.numeric(A))) + 
  geom_boxplot() +
  facet_grid(.~Virus)

edited Aug 1, 2020 at 13:17

answered Jul 31, 2020 at 21:02

Allan Cameron

178k7 gold badges70 silver badges118 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

tjebo Over a year ago

your column split looks a bit too complicated maybe? Something like tidyr::separate would do a similar job with less code, I guess... ?

Allan Cameron Over a year ago

Thanks @Tjebo. As I was writing this I thought "it would be nice if tidyr::separate took numeric values to split on". I didn't realise it could! Now I know :)

Ree Nadeau Over a year ago

What are the c(2, 5) for? Please excuse me if this is a silly question - I'm still new to R

Allan Cameron Over a year ago

@ReeNadeau these are the number of characters after which we split the string. "ABCDEFG" would split into "AB", "CDE", "FG"

Ree Nadeau Over a year ago

Ohhh okay. Do underscores or dashes count as characters?

|

Collectives™ on Stack Overflow

Filtering the x values while using ggplot

1 Answer 1

6 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

6 Comments

Your Answer

Sign up or log in

Post as a guest

Related