I have been struggling to create a new column based on the values of certain rows and columns in another dataframe. I have a data frame that looks like this:
| sample_ID | date | test_result |
|---|---|---|
| sample1 | 1/1/2022 | positive |
| sample1 | 1/1/2022 | negative |
| sample1 | 1/1/2022 | negative |
| sample2 | 2/1/2022 | positive |
| sample2 | 3/1/2022 | negative |
| sample3 | 4/1/2022 | negative |
| sample3 | 5/1/2022 | positive |
| sample4 | 5/1/2022 | negative |
| sample4 | 6/1/2022 | negative |
| sample4 | 7/1/2022 | negative |
I want to create a new column with a decision of the final result of each sample ID. If sample is positive at any date, the final result will be the test result of the earliest date of positivity, otherwise the sample is negative. The results should look like this:
| sample_ID | date | test_result | final_result |
|---|---|---|---|
| sample1 | 1/1/2022 | positive | positive |
| sample1 | 1/1/2022 | negative | positive |
| sample1 | 1/1/2022 | negative | positive |
| sample2 | 2/1/2022 | positive | positive |
| sample2 | 3/1/2022 | negative | positive |
| sample3 | 4/1/2022 | negative | positive |
| sample3 | 5/1/2022 | positive | positive |
| sample4 | 5/1/2022 | negative | negative |
| sample4 | 6/1/2022 | negative | negative |
| sample4 | 7/1/2022 | negative | negative |
I did try with ifelse and loop but it was not successful. I would appreciate any help. Thank you very much.