35

ETA: the point of the below, by the way, is to not have to iterate through my entire set of column vectors, just in case that was a proposed solution (just do what is known to work once at a time).


There's plenty of examples of replacing values in a single vector of a data frame in R with some other value.

And also how to replace all values of NA with something else:

What I'm looking for is analogous to the last question, but basically trying to replace one value with another. I'm having trouble generating a data frame of logical values mapped to my actual data frame for cases where multiple columns meet a criteria, or simply trying to do the actions from the first two questions on more than one column.

An example:

data <- data.frame(name = rep(letters[1:3], each = 3), var1 = rep(1:9), var2 = rep(3:5, each = 3))

data
  name var1 var2
1    a    1    3
2    a    2    3
3    a    3    3
4    b    4    4
5    b    5    4
6    b    6    4
7    c    7    5
8    c    8    5
9    c    9    5

And say I want all of the values of 4 in var1 and var2 to be 10.

I'm sure this is elementary and I'm just not thinking through it properly. I have been trying things like:

data[data[, 2:3] == 4, ]

That doesn't work, but if I do the same with data[, 2] instead of data[, 2:3], things work fine. It seems that logical test (like is.na()) work on multiple rows/columns, but that numerical comparisons aren't playing as nicely?

5 Answers 5

76

you want to search through the whole data frame for any value that matches the value you're trying to replace. the same way you can run a logical test like replacing all missing values with 10..

data[ is.na( data ) ] <- 10

you can also replace all 4s with 10s.

data[ data == 4 ] <- 10

at least i think that's what you're after?

and let's say you wanted to ignore the first row (since it's all letters)

# identify which columns contain the values you might want to replace
data[ , 2:3 ]

# subset it with extended bracketing..
data[ , 2:3 ][ data[ , 2:3 ] == 4 ]
# ..those were the values you're going to replace

# now overwrite 'em with tens
data[ , 2:3 ][ data[ , 2:3 ] == 4 ] <- 10

# look at the final data
data
Sign up to request clarification or add additional context in comments.

1 Comment

I flipping swear I tried this and it wasn't working for me before. I hope to get to the point where I don't kick myself everytime I post to SO... By the way -- you're the 1min R video guy, aren't you!? Those rock.
7

Basically data[, 2:3]==4 gave you the index for data[,2:3] instead of data:

R > data[, 2:3] ==4
       var1  var2
 [1,] FALSE FALSE
 [2,] FALSE FALSE
 [3,] FALSE FALSE
 [4,]  TRUE  TRUE
 [5,] FALSE  TRUE
 [6,] FALSE  TRUE
 [7,] FALSE FALSE
 [8,] FALSE FALSE
 [9,] FALSE FALSE

So you may try this:

R > data[,2:3][data[, 2:3] ==4]
[1] 4 4 4 4

1 Comment

Thanks for this; also works. I just think the one from Anthony is a tad simpler. Big thanks for explaining why mine wasn't working though; after playing around some more, I see what you mean: me trying to apply values to data based on a comparison that was also subsetting makes a lot more sense.
2

Just to provide a different answer, I thought I would write up a vector-math approach:

You can create a transformation matrix (really a data frame here, but will work the same), using a the vectorized 'ifelse' statement and multiply the transformation matrix and your original data, like so:

df.Rep <- function(.data_Frame, .search_Columns, .search_Value, .sub_Value){
   .data_Frame[, .search_Columns] <- ifelse(.data_Frame[, .search_Columns]==.search_Value,.sub_Value/.search_Value,1) * .data_Frame[, .search_Columns]
    return(.data_Frame)
}

To replace all values 4 with 10 in the data frame 'data' in columns 2 through 3, you would use the function like so:

# Either of these will work.  I'm just showing options.
df.Rep(data, 2:3, 4, 10)
df.Rep(data, c("var1","var2"), 4, 10)

#   name var1 var2
# 1    a    1    3
# 2    a    2    3
# 3    a    3    3
# 4    b   10   10
# 5    b    5   10
# 6    b    6   10
# 7    c    7    5
# 8    c    8    5
# 9    c    9    5

1 Comment

test should be data, no? :)
1

Just for continuity

    data[,2:3][ data[,2:3] == 4 ] <- 10

But it looks ugly, So do it in 2 steps is better.

Comments

0

Tidyverse

Here is a dplyr solution:

library(dplyr)

data |> 
  mutate(across(var1:var2, \(x) replace(x, x == 4, 10)))
#   name var1 var2
# 1    a    1    3
# 2    a    2    3
# 3    a    3    3
# 4    b   10   10
# 5    b    5   10
# 6    b    6   10
# 7    c    7    5
# 8    c    8    5
# 9    c    9    5

The first argument of across() is the columns you want to modify with a function. There are a number of handy tidy-selection helpers so you can easily pick multiple columns to modify.

Here I used a range from var1 to var2 (which are right next to each other). This could have been written as c(var1, var2) if, for example, these columns were not next to one another.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.