I want to create boxplots comparing the analyte concentrations but grouping the samples on which donor they came from (D1 to D4), which virus they contained (VEH, HCV, or HIV) and whether or not they incubated with CO2 (+ or - CO2), ALL of which can be determined by the sample name. For example, the first sample, D1VEH+CO2 came from Donor 1, had the virus "VEH" (which technically isn't a virus but that's besides the point), and was incubated with CO2. I don't have to do all of these at once - I'll create a series of different boxplots. The thing I'm struggling with is isolating the different groups within the mappings. For example, see the command below:
ggplot(data = df, mapping = aes(x = AnalyteSample, y = A)) + geom_boxplot()
Now this gives me many boxplots of ALL the samples. What if I only want the boxplots of the samples containing the virus HIV? How do I filter the AnalyteSample column within a ggplot command?
structure(list(AnalyteSample = c("D1VEH+CO2", "D1HCV+CO2", "D1VEH-CO2",
"D1HCV-CO2", "D2VEH+CO2", "D2HCV+CO2", "D2VEH-CO2", "D2HCV-CO2",
"D3VEH+CO2", "D3HCV+CO2", "D3VEH-CO2", "D3HCV-CO2", "D4VEH+CO2",
"D4VEH-CO2"), A = c("4190", "6665", "7435", "2052", "783", "322",
"199", "90", "46", "17", "8", "3", "3", NA), B = c("11569", "6677",
"3852", "983.88", "589", "359", "203", "68", "33", "12", "6",
NA, "4", NA), C = c("20453", "7699", "2499", "707.98", "412",
"328", "156", "88", "39", "27", "17", NA, NA, NA), D = c("7893",
NA, "1623", "685.64", "321", "644", "112", "65", "35", "29",
"9", "5", NA, NA), E = c("320", "15444", "2049", "1065", "389",
"365", "145", "77", "38", "16", "9", "6", NA, NA), F = c("7438",
NA, "3472", "1057", "563", "401", "167", "89", "46", "19", "6",
NA, NA, NA), G = c(7345, 9001, 2473, 1138, 516, 403, 134, 81,
37, 17, 8, 6, 4, 3), H = c("9004", "3998", "2299", "964.88",
"499", "341", "112", "88", "39", "32", NA, NA, NA, NA), I = c("8434",
"8700", "2217", "1263", "567", "352", "153", "80", "43", "18",
"9", "2", "3", NA), J = c("7734", "6733", "2092", "1115", "637",
"332", "155", "82", "37", "17", "10", "4", "1", NA), K = c(NA,
NA, "2118", "862.13", "426", "355", "143", "78", "44", "22",
"11", NA, NA, NA), L = c(6345, 7688, 2311, 1195, 647, 366, 177,
83, 41, 20, 8, 6, 3, 2), M = c("4222", NA, "1846", "814.61",
"422", "314", "154", "86", "41", "27", "21", NA, NA, NA), N = c("6773",
"8934", "2381", "1221", "677", "356", "146", "89", "40", "17",
"10", "5", "2", NA), O = c(NA, NA, NA, "564.5", "226", "476",
"111", "60", "32", "36", "18", NA, NA, NA)), row.names = c(NA,
-14L), class = "data.frame")

