1

I have the following example dataframe in R:

SampleID <- c("A", "A", "A", "A", "B", "B", "C", "C", "C", "C", "C", "C", "D", "D", "E", "E", "E", "E", "F", "F")
Analyte <- c("A1", "A1", "A2", "A2", "B1", "B2", "C1", "C1", "C1", "C2", "C2", "C2", "D1", "D2", "E1", "E1", "E2", "E2", "F1", "F2")
Fraction <- c("Dissolved", "Total", "Dissolved", "Total", "Total", "Total", "Dissolved", "Suspended", "Total", "Dissolved", "Suspended", "Total", "Unknown", "Unknown", "Dissolved", "Suspended", "Dissolved", "Suspended", "Dissolved", "Dissolved")
Concentration <- c(4.2, 5.6, 8.6, 11.2, 2.1, 9.6, 15.6, 28.7, 42.3, 18.3, 23.2, 48.6, 6.4, 28.8, 9.1, 32.5, 36.4, 24.5, 10.7, 3.4)
MyData <- data.frame(SampleID, Analyte, Fraction, Concentration)

MyData

   SampleID Analyte  Fraction Concentration
1         A      A1 Dissolved           4.2
2         A      A1     Total           5.6
3         A      A2 Dissolved           8.6
4         A      A2     Total          11.2
5         B      B1     Total           2.1
6         B      B2     Total           9.6
7         C      C1 Dissolved          15.6
8         C      C1 Suspended          28.7
9         C      C1     Total          42.3
10        C      C2 Dissolved          18.3
11        C      C2 Suspended          23.2
12        C      C2     Total          48.6
13        D      D1   Unknown           6.4
14        D      D2   Unknown          28.8
15        E      E1 Dissolved           9.1
16        E      E1 Suspended          32.5
17        E      E2 Dissolved          36.4
18        E      E2 Suspended          24.5
19        F      F1 Dissolved          10.7
20        F      F2 Dissolved           3.4

I would like to do the following:

  1. For each SampleID, if an Analyte has a "Total" Fraction reported, retain only that row for the Analyte and remove rows with any other Fraction value (i.e., Dissolved, Suspended) for that Analyte.

  2. If an Analyte for a SampleID includes both Dissolved and Suspended in the Fraction column (and no other values for Fraction), sum the concentrations for Dissolved and Suspended and add a row for that Analyte with the Fraction column labeled Total and the Concentration column listing the sum. Remove the original rows for Dissolved and Suspended for that Analyte.

So for the dataframe above, the two Analytes of SampleID "A" have Dissolved and Total, so I would want to remove the rows with the Dissolved Fraction. For SampleID "C", I would want to remove the Dissolved and Suspended Fractions of both Analytes and just keep the rows with Total. And lastly, for SampleID "E", the Dissolved and Suspended Fractions for each of the two Analytes would be summed together and the result would be a new row for each Analyte that represents the sum (relabeled as Total), and the rows associated with the Dissolved and Suspended Fractions would be removed.

The output of the above dataframe MyData would be the following:

   SampleID Analyte  Fraction Concentration
2         A      A1     Total           5.6
4         A      A2     Total          11.2
5         B      B1     Total           2.1
6         B      B2     Total           9.6
9         C      C1     Total          42.3
12        C      C2     Total          48.6
13        D      D1   Unknown           6.4
14        D      D2   Unknown          28.8
15        E      E1     Total          41.6 
17        E      E2     Total          60.9
19        F      F1 Dissolved          10.7
20        F      F2 Dissolved           3.4

Note that the example I have provided is just a small subset of a much larger dataset that includes hundreds of SampleIDs, but the Fraction column can only equal the values listed in the original dataframe above (i.e., Dissolved, Suspended, Total, or Unknown).

Thank you!

2 Answers 2

1

You can also use the following solution. It may sound a bit verbose but will also get the job done:

library(dplyr)
library(purrr)


MyData %>%
  group_split(SampleID, Analyte) %>%
  map(~ if("Total" %in% .x$Fraction) {
    .x %>% filter(Fraction == "Total")} else {
      .x
    }) %>%
  map(~ if(all(c("Dissolved", "Suspended") %in% .x$Fraction)) {
    add_row(.x, SampleID = .x$SampleID[1], Analyte = .x$Analyte[1], 
            Fraction = "Total", Concentration = sum(.x$Concentration))
  } else {
    .x
  }) %>%
  map_dfr(~ if("Total" %in% .x$Fraction) {
    .x %>% filter(Fraction == "Total")} else {
      .x
    })


# A tibble: 12 x 4
   SampleID Analyte Fraction  Concentration
   <chr>    <chr>   <chr>             <dbl>
 1 A        A1      Total               5.6
 2 A        A2      Total              11.2
 3 B        B1      Total               2.1
 4 B        B2      Total               9.6
 5 C        C1      Total              42.3
 6 C        C2      Total              48.6
 7 D        D1      Unknown             6.4
 8 D        D2      Unknown            28.8
 9 E        E1      Total              41.6
10 E        E2      Total              60.9
11 F        F1      Dissolved          10.7
12 F        F2      Dissolved           3.4
Sign up to request clarification or add additional context in comments.

11 Comments

Have you tried with if/else if/else in a single map
What I find in your post is that you are looping 3 times with map. You have a single if/else in each map.
Or are you saying that it should be sequential after each filter
In some cases, it is needed sequential i.e. all(c("Dissolved", "Suspended") may not be TRUE if it is not filtered. I haven't run the whole code
Oh thank god! I don't have to be worried about OP running into a problem with my solution. Yes it's weird I noticed sometimes it doesn't run. Thank you for your time I really appreciate your help :)
|
0

This could be done as:

library(tidyverse)
MyData %>%
  pivot_wider(c(SampleID, Analyte),Fraction, values_from = Concentration) %>%
  mutate(Total = coalesce(Total, Dissolved + Suspended), 
         Dissolved = ifelse(is.na(Total)&is.na(Suspended), Dissolved, NA),
         Suspended = ifelse(is.na(Total)&is.na(Dissolved), Suspended, NA)) %>%
  pivot_longer(-c(SampleID, Analyte), values_drop_na = TRUE)

# A tibble: 12 x 4
   SampleID Analyte name      value
   <chr>    <chr>   <chr>     <dbl>
 1 A        A1      Total       5.6
 2 A        A2      Total      11.2
 3 B        B1      Total       2.1
 4 B        B2      Total       9.6
 5 C        C1      Total      42.3
 6 C        C2      Total      48.6
 7 D        D1      Unknown     6.4
 8 D        D2      Unknown    28.8
 9 E        E1      Total      41.6
10 E        E2      Total      60.9
11 F        F1      Dissolved  10.7
12 F        F2      Dissolved   3.4
  

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.