2

I have a dataframe df1 with multiple columns, each column representing a species name (sp1, sp2, sp3, ...).

df1

  sp1   sp2   sp3    sp4 
  NA    NA    r1      r1
  NA    NA    1       3
  NA    5     NA      NA
  m4    NA    NA      m2

I would like to replace each value in df1 with values based on a second dataframe, df2. Here, the values in df1 should match df2$scale_nr, and replaced by df2$percentage. Thus, the result should be so that I have my values in df1 based on $percentage in df2.

df2

scale_nr   percentage
r1          1
p1          1
a1          1
m1          1
r2          2
p2          2
a2          2
m2          2
1           10
2           20
3           30
4           40
...

Then after replacement df1 should look like

df1

  sp1   sp2   sp3    sp4
  NA    NA    1       1
  NA    NA    10      30
  NA    50    NA      NA
  4     NA    NA      2

I tried this:

df2$percentage[match(df1$sp1, df2$scale_nr)] # this one works for one column

which works for one column, I know I should be able to do this over all columns easily, but somehow I can't figure it out.

I know I could do it by 'hand', like

df[df == 'Old Value'] <- 'New value' 

but this seems highly inefficient because I have 40 different values that need to be replaced.

Can someone please help me with a solution for this?

1 Answer 1

0

You can use lapply on the frame to iterate the same thing over multiple columns.

df1[] <- lapply(df1, function(z) df2$percentage[match(z, df2$scale_nr)])
df1
#   sp1 sp2 sp3 sp4
# 1  NA  NA   1   1
# 2  NA  NA  10  30
# 3  NA  NA  NA  NA
# 4  NA  NA  NA   2

The missing values are likely because of the truncated df2 in the sample data.

If you want the option to preserve the previous value if not found in df2, then you can modify that slightly:

df1[] <- lapply(df1, function(z) {
  newval <- df2$percentage[match(z, df2$scale_nr)]
  ifelse(is.na(newval), z, newval)
})
df1
#    sp1 sp2  sp3  sp4
# 1 <NA>  NA    1    1
# 2 <NA>  NA   10   30
# 3 <NA>   5 <NA> <NA>
# 4   m4  NA <NA>    2

FYI, the reassignment into df1[] <- is important, in constrast with df1 <-. The difference is that lapply is going to return a list, so if you use df1 <-thendf1will no longer be adata.frame. Using df[] <-, you are telling it to replace the contents of the columns without changing the overall class of df1`.

If you need to do this on only a subset of columns, that's easy:

df1[1:3] <- lapply(df[1:3], ...)`
Sign up to request clarification or add additional context in comments.

1 Comment

Thank you so much! Especially the df[] <- I was not aware of and did the trick

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.