How to replace values in dataframe based on a second dataframe in R?

Question

I have a dataframe df1 with multiple columns, each column representing a species name (sp1, sp2, sp3, ...).

df1

  sp1   sp2   sp3    sp4 
  NA    NA    r1      r1
  NA    NA    1       3
  NA    5     NA      NA
  m4    NA    NA      m2

I would like to replace each value in df1 with values based on a second dataframe, df2. Here, the values in df1 should match df2$scale_nr, and replaced by df2$percentage. Thus, the result should be so that I have my values in df1 based on $percentage in df2.

df2

scale_nr   percentage
r1          1
p1          1
a1          1
m1          1
r2          2
p2          2
a2          2
m2          2
1           10
2           20
3           30
4           40
...

Then after replacement df1 should look like

df1

  sp1   sp2   sp3    sp4
  NA    NA    1       1
  NA    NA    10      30
  NA    50    NA      NA
  4     NA    NA      2

I tried this:

df2$percentage[match(df1$sp1, df2$scale_nr)] # this one works for one column

which works for one column, I know I should be able to do this over all columns easily, but somehow I can't figure it out.

I know I could do it by 'hand', like

df[df == 'Old Value'] <- 'New value'

but this seems highly inefficient because I have 40 different values that need to be replaced.

Can someone please help me with a solution for this?

r2evans · Accepted Answer · 2022-02-17 16:52:35Z

0

You can use lapply on the frame to iterate the same thing over multiple columns.

df1[] <- lapply(df1, function(z) df2$percentage[match(z, df2$scale_nr)])
df1
#   sp1 sp2 sp3 sp4
# 1  NA  NA   1   1
# 2  NA  NA  10  30
# 3  NA  NA  NA  NA
# 4  NA  NA  NA   2

The missing values are likely because of the truncated df2 in the sample data.

If you want the option to preserve the previous value if not found in df2, then you can modify that slightly:

df1[] <- lapply(df1, function(z) {
  newval <- df2$percentage[match(z, df2$scale_nr)]
  ifelse(is.na(newval), z, newval)
})
df1
#    sp1 sp2  sp3  sp4
# 1 <NA>  NA    1    1
# 2 <NA>  NA   10   30
# 3 <NA>   5 <NA> <NA>
# 4   m4  NA <NA>    2

FYI, the reassignment into df1[] <- is important, in constrast with df1 <-. The difference is that lapply is going to return a list, so if you use df1 <-thendf1will no longer be adata.frame. Using df[] <-, you are telling it to replace the contents of the columns without changing the overall class of df1`.

If you need to do this on only a subset of columns, that's easy:

df1[1:3] <- lapply(df[1:3], ...)`

answered Feb 17, 2022 at 16:52

r2evans

167k8 gold badges92 silver badges176 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Pinla Over a year ago

Thank you so much! Especially the df[] <- I was not aware of and did the trick

Collectives™ on Stack Overflow

How to replace values in dataframe based on a second dataframe in R?

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related