I have the following example dataframe in R:
SampleID <- c("A", "A", "A", "A", "B", "B", "C", "C", "C", "C", "C", "C", "D", "D", "E", "E", "E", "E", "F", "F")
Analyte <- c("A1", "A1", "A2", "A2", "B1", "B2", "C1", "C1", "C1", "C2", "C2", "C2", "D1", "D2", "E1", "E1", "E2", "E2", "F1", "F2")
Fraction <- c("Dissolved", "Total", "Dissolved", "Total", "Total", "Total", "Dissolved", "Suspended", "Total", "Dissolved", "Suspended", "Total", "Unknown", "Unknown", "Dissolved", "Suspended", "Dissolved", "Suspended", "Dissolved", "Dissolved")
Concentration <- c(4.2, 5.6, 8.6, 11.2, 2.1, 9.6, 15.6, 28.7, 42.3, 18.3, 23.2, 48.6, 6.4, 28.8, 9.1, 32.5, 36.4, 24.5, 10.7, 3.4)
MyData <- data.frame(SampleID, Analyte, Fraction, Concentration)
MyData
SampleID Analyte Fraction Concentration
1 A A1 Dissolved 4.2
2 A A1 Total 5.6
3 A A2 Dissolved 8.6
4 A A2 Total 11.2
5 B B1 Total 2.1
6 B B2 Total 9.6
7 C C1 Dissolved 15.6
8 C C1 Suspended 28.7
9 C C1 Total 42.3
10 C C2 Dissolved 18.3
11 C C2 Suspended 23.2
12 C C2 Total 48.6
13 D D1 Unknown 6.4
14 D D2 Unknown 28.8
15 E E1 Dissolved 9.1
16 E E1 Suspended 32.5
17 E E2 Dissolved 36.4
18 E E2 Suspended 24.5
19 F F1 Dissolved 10.7
20 F F2 Dissolved 3.4
I would like to do the following:
For each
SampleID, if anAnalytehas a "Total"Fractionreported, retain only that row for theAnalyteand remove rows with any otherFractionvalue (i.e., Dissolved, Suspended) for thatAnalyte.If an
Analytefor aSampleIDincludes both Dissolved and Suspended in theFractioncolumn (and no other values forFraction), sum the concentrations for Dissolved and Suspended and add a row for thatAnalytewith theFractioncolumn labeled Total and theConcentrationcolumn listing the sum. Remove the original rows for Dissolved and Suspended for thatAnalyte.
So for the dataframe above, the two Analytes of SampleID "A" have Dissolved and Total, so I would want to remove the rows with the Dissolved Fraction. For SampleID "C", I would want to remove the Dissolved and Suspended Fractions of both Analytes and just keep the rows with Total. And lastly, for SampleID "E", the Dissolved and Suspended Fractions for each of the two Analytes would be summed together and the result would be a new row for each Analyte that represents the sum (relabeled as Total), and the rows associated with the Dissolved and Suspended Fractions would be removed.
The output of the above dataframe MyData would be the following:
SampleID Analyte Fraction Concentration
2 A A1 Total 5.6
4 A A2 Total 11.2
5 B B1 Total 2.1
6 B B2 Total 9.6
9 C C1 Total 42.3
12 C C2 Total 48.6
13 D D1 Unknown 6.4
14 D D2 Unknown 28.8
15 E E1 Total 41.6
17 E E2 Total 60.9
19 F F1 Dissolved 10.7
20 F F2 Dissolved 3.4
Note that the example I have provided is just a small subset of a much larger dataset that includes hundreds of SampleIDs, but the Fraction column can only equal the values listed in the original dataframe above (i.e., Dissolved, Suspended, Total, or Unknown).
Thank you!