0

I am trying to replace values from two columns with values from another two columns. This is a rather basic question and has been asked by python users, however I am using R.

I have a df that looks like this (only on a much larger scale [>20,000]):

squirrel_id    locx    locy    dist
6391           17.5    10.0    50.0
6391           17.5    10.0    20.0
6391           17.5    10.0    15.5
8443           20.5    1.0     800
6025           -5.0    -0.5    0.0

I need to, for 63 squirrels, replace their locx and locy values.

I normally replace values with the following code:

library(dplyr)    

df <- df %>%
   mutate(locx = ifelse (squirrel_id=="6391", "12.5", locx),
         locy = ifelse (squirrel_id=="6391", "15.5", locy),
         locx = ifelse (squirrel_id=="8443", "2.5", locx),
         locy = ifelse (squirrel_id=="8443", "80", locy)) #etc for 63 squirrels

Which would give me:

squirrel_id    locx    locy    dist
6391           12.5    10.0    50.0
6391           12.5    10.0    20.0
6391           12.5    10.0    15.5
8443           2.5     80.0    800
6025           -5.0    -0.5    0.0

But this is creating an extra 126 lines of code and I suspect there is a simpler way to do this.

I do have all the new locx and locy values in a separate df, but I do not know how to join the two dataframes by squirrel_id without it messing up the data.

df with the values that need to replace the ones in the old df:

squirrel_id    new_locx    new_locy   
6391           12.5        15.5 
8443           2.5         80
6025           -55.0       0.0

How can I do this more efficiently?

2 Answers 2

1

You can left_join the two data frames and then use an if_else statement to get the right locx and locy. Try out:

library(dplyr)
df %>% left_join(df2, by = "squirrel_id") %>%
        mutate(locx = if_else(is.na(new_locx), locx, new_locx), # as suggested by @echasnovski, we can also use locx = coalesce(new_locx, locx)
               locy = if_else(is.na(new_locy), locy, new_locy)) %>% # or locy = coalesce(new_locy, locy)
        select(-new_locx, -new_locy)
# output
  squirrel_id  locx locy  dist
1        6391  12.5 15.5  50.0
2        6391  12.5 15.5  20.0
3        6391  12.5 15.5  15.5
4        8443   2.5 80.0 800.0
5        6025 -55.0  0.0   0.0
6        5000  18.5 18.5  10.0 # squirrel_id 5000 was created for an example of id 
# present if df but not in df2

Data

df <- structure(list(squirrel_id = c(6391L, 6391L, 6391L, 8443L, 6025L, 
5000L), locx = c(17.5, 17.5, 17.5, 20.5, -5, 18.5), locy = c(10, 
10, 10, 1, -0.5, 12.5), dist = c(50, 20, 15.5, 800, 0, 10)), class = "data.frame", row.names = c(NA, 
-6L))
df2 <- structure(list(squirrel_id = c(6391L, 8443L, 6025L), new_locx = c(12.5, 
2.5, -55), new_locy = c(15.5, 80, 0)), class = "data.frame", row.names = c(NA, 
-3L))
Sign up to request clarification or add additional context in comments.

2 Comments

Note, that instead of if_else(is.na(x), x, y) you can use coalesce(x, y).
Thank you for pointing that @echasnovski, I will edit my post
0

Using @ANG's data, here's a data.table solution. It joins and updates the original df by reference.

library(data.table)

setDT(df)
setDT(df2)

df[df2, on = c('squirrel_id'), `:=` (locx = new_locx, locy = new_locy) ]

df

   squirrel_id  locx locy  dist
1:        6391  12.5 15.5  50.0
2:        6391  12.5 15.5  20.0
3:        6391  12.5 15.5  15.5
4:        8443   2.5 80.0 800.0
5:        6025 -55.0  0.0   0.0
6:        5000  18.5 12.5  10.0

See also:

how to use merge() to update a table in R

Replace a subset of a data frame with dplyr join operations

R: Updating a data frame with another data frame

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.