1

I have two dataframes, one old and one new. Both dataframes contain the same IDs but the values in the columns can differ amongst the dataframes. The old dataframe contains the 'correct' values and therefor need to be replaced in the new information.

This is the old dataframe:

structure(list(ID = c(1, 2, 3, 4, 5, 6, 7), a = c("hi", "ho", 
NA, "hu", "ha", NA, "he"), b = c(1, 1, NA, 1, 1, NA, 1), c = c("ri", 
"ro", NA, "ru", NA, NA, "re"), d = c(2, 2, NA, 2, NA, NA, 2)), row.names = c(NA, 
-7L), class = c("tbl_df", "tbl", "data.frame"))

 ID a         b c         d

 1 hi        1 ri        2
 2 ho        1 ro        2
 3 NA       NA NA       NA
 4 hu        1 ru        2
 5 ha        1 NA       NA
 6 NA       NA NA       NA
 7 he        1 re        2

And this the new dataframe:

structure(list(ID = c(1, 2, 3, 4, 5, 6, 7, 10, 11), a = c("hi", 
"ho", NA, "hu", "ha", NA, "he", "hii", "hoo"), b = c(3, 1, NA, 
1, 1, NA, 1, 1, 1), c = c("ri", "ro", NA, "ru", "ra", NA, "re", 
NA, "roo"), d = c(3, 2, NA, 2, 2, NA, 2, NA, 2)), row.names = c(NA, 
-9L), class = c("tbl_df", "tbl", "data.frame"))

  ID a         b c         d
     1 hi        3 ri        3
     2 ho        1 ro        2
     3 NA       NA NA       NA
     4 hu        1 ru        2
     5 ha        1 ra        2
     6 NA       NA NA       NA
     7 he        1 re        2
    10 hii       1 NA       NA
    11 hoo       1 roo       2

The IDs from the old df are included in the new df but in the new df there are also new IDs. These need to be included in the final output. The values that are different in the new df need to be replaced by the values in the old df. Also, new data can be added in the columns (ID number 5 has new data in columns c and d). These also need to be included in the final output.

The final output should look like:

structure(list(ID = c(1, 2, 3, 4, 5, 6, 7, 10, 11), a = c("hi", 
"ho", NA, "hu", "ha", NA, "he", "hii", "hoo"), b = c(1, 1, NA, 
1, 1, NA, 1, 1, 1), c = c("ri", "ro", NA, "ru", "ra", NA, "re", 
NA, "roo"), d = c(2, 2, NA, 2, 2, NA, 2, NA, 2)), row.names = c(NA, 
-9L), class = c("tbl_df", "tbl", "data.frame"))



  ID a         b c         d
     1 hi        1 ri        2
     2 ho        1 ro        2
     3 NA       NA NA       NA
     4 hu        1 ru        2
     5 ha        1 ra        2
     6 NA       NA NA       NA
     7 he        1 re        2
    10 hii       1 NA       NA
    11 hoo       1 roo       2

Can anyone help achieve this? I've seen the function match but this seems to only work for certain columns.

Thanks a lot!

1 Answer 1

1

Call first sample df old_df, call second new_df. It sounds like essentially you want to update rows in new_df with values from old_df, retaining all non-matching rows in new_df:

library(dplyr)
new_df %>% rows_update(old_df, by = "ID")

Gives:

# A tibble: 9 x 5
     ID a         b c         d
  <dbl> <chr> <dbl> <chr> <dbl>
1     1 hi        1 ri        2
2     2 ho        1 ro        2
3     3 NA       NA NA       NA
4     4 hu        1 ru        2
5     5 ha        1 NA       NA
6     6 NA       NA NA       NA
7     7 he        1 re        2
8    10 hii       1 NA       NA
9    11 hoo       1 roo       2
Sign up to request clarification or add additional context in comments.

1 Comment

Perfect! Thank you

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.