For each row, replace values from specific columns (defined by another dataframe), with a value from a vector

Question

Let's say we have:

set.seed(42)
df1 <- data.frame(v1=rnorm(10) , v2=rnorm(10), v3=rnorm(10), v4=rnorm(10))

as well as

df2 <- data.frame(v1=rnorm(10) , v2=rnorm(10), v3=rnorm(10), v4=rnorm(10))
vector <- c(17,21,33,41,50,63,72,81,91,10)

df1 and df2 have same column names and df2 is generated by processing of df1.

For each row in df2, I would like to replace a value that meets the condition < 0.5 in df1, with the corresponding value of the vector.

For example, if any of the columns of the first row in df1 has a value lower than 0.5, then the corresponding column(s) of the first row in df2 will have to be replaced with the first element of the vector, that is 17. For the second row, they will be replaced with 21 etc.

I picture some apply and a custom made function would do the trick but I am not able to figure it out. Thank you in advance for the solution.

markus · Accepted Answer · 2020-06-22 05:48:30Z

3

1)

My approach was:

idx <- df1 < .5
tmp <- idx * vector
df2[idx] <- tmp[idx]

2)

A second option provided by @MartinGal in the comments:

df2 * (df1>=0.5) + (df1<0.5) * vector

Result is

df2
#           v1            v2          v3         v4
#1  -1.4936251  5.676206e-01 -0.08610730 17.0000000
#2  21.0000000  2.100000e+01 -0.88767902 21.0000000
#3  33.0000000  6.288407e-05 33.00000000 33.0000000
#4  41.0000000  1.122890e+00 -0.02944488 41.0000000
#5  50.0000000  5.000000e+01 50.00000000 50.0000000
#6  -0.4282589  6.300000e+01 63.00000000 63.0000000
#7  72.0000000  7.200000e+01 72.00000000 72.0000000
#8  81.0000000  8.100000e+01 81.00000000 -0.8002822
#9  -1.2247480  9.100000e+01 91.00000000 91.0000000
#10  0.1795164 -5.246948e-02 10.00000000 10.0000000

We first check at which positions df1 is < .5 and multiply this by vector to get this matrix

idx <- df1 < .5
tmp <- (idx) * vector
tmp
#      v1 v2 v3 v4
# [1,]  0  0  0 17
# [2,] 21 21  0 21
# [3,] 33  0 33 33
# [4,] 41  0  0 41
# [5,] 50 50 50 50
# [6,]  0 63 63 63
# [7,] 72 72 72 72
# [8,] 81 81 81  0
# [9,]  0 91 91 91
#[10,]  0  0 10 10

These are the values you want to insert in df2 at the position where idx equals TRUE.

So the next step is to replace the those values in df2 using a logical matrix, i.e. idx:

df2[idx] <- tmp[idx]

edited Jun 22, 2020 at 5:48

answered Jun 20, 2020 at 20:09

markus

26.5k5 gold badges47 silver badges59 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Martin Gal Over a year ago

Brilliant idea! If I'm not mistaken this can be "simplified" to df2 * (df1>=0.5) + (df1<0.5) * vector

markus Over a year ago

@MartinGal I think your approach is excellent. Please post as answer.

Martin Gal Over a year ago

I just used your approach so it's basically just a variation of your answer. :-)

akrun · Accepted Answer · 2020-06-20 21:10:40Z

1

We can also use Map from base R

data.frame(Map(function(x, y) ifelse(x < 0.5, vector, y) , df1, df2))

Or using map2 from purrr

library(purrr)
map2_df(df1, df2, ~ case_when(.x < 0.5 ~  vector, TRUE~ .y))

edited Jun 20, 2020 at 21:10

answered Jun 20, 2020 at 20:46

akrun

891k38 gold badges590 silver badges700 bronze badges

Collectives™ on Stack Overflow

For each row, replace values from specific columns (defined by another dataframe), with a value from a vector

2 Answers 2

3 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related