Efficient way to replace value in binary column with 1 in R

Question

I have a below data frame and I want to check binary columns and change non-empty value to 1.

a <- c("","a","a","","a")
b <- c("","b","b","b","b")
c <- c("c","","","","c")
d <- c("b","a","","c","d")

dt <- data.frame(a,b,c,d)

I am able to get the solution by looping and traversing through each column. But, I want some efficient solution because my data frame is really really large and the below solution is way much slower.

My Solution-

for(i in 1:length(colnames(dt)))
{
  if(length(table(dt[,i]))==2){
  dt[which(dt[,i]!=""),i] <- 1
  }
}

Expected Output:

 a b c d
     1 b
 1 1   a
 1 1    
   1   c
 1 1 1 d

Is there a way to make it more efficient.

If you are looking at the "length" of individual cells, then you need nchar not length. Do you want to replace the empty values with NA, 0, or something else? (It would really help if you provided your expected output.) — r2evans
– r2evans, Commented Feb 14, 2018 at 20:10
You're code looks mostly fine. I would just suggest that length(unique(dt[, 1])) == 2) will probably be faster than table(). If, as in your sample data, your columns are already factors you could do a little better reassigning the levels. — Gregor Thomas
– Gregor Thomas, Commented Feb 14, 2018 at 20:11

Julien Navarre · Accepted Answer · 2018-02-14 23:16:08Z

2

Since your concerns seems to be efficiency you may want to look at packages like dplyr or data.table

library(dplyr)
mutate_all(dt, .funs = quo(if_else(n_distinct(.) <= 2L & . != "", "1", .)))

library(data.table)
setDT(dt)
dt[ , lapply(.SD, function(x) ifelse(uniqueN(x) <= 2L & x != "", 1, x))]

edited Feb 14, 2018 at 23:16

answered Feb 14, 2018 at 21:03

Julien Navarre

7,8603 gold badges46 silver badges73 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

IceCreamToucan Over a year ago

If using data.table, you can use the fast uniqueN(x) in place of length(unique(x)).

d.b · Accepted Answer · 2018-02-14 20:22:02Z

2

inds = lengths(lapply(dt, unique)) == 2
dt[inds] = lapply(dt[inds], function(x) as.numeric(as.character(x) != ""))
dt
#  a b c d
#1 0 0 1 b
#2 1 1 0 a
#3 1 1 0  
#4 0 1 0 c
#5 1 1 1 d

If you want "" instead of 0

dt[inds] = lapply(dt[inds], function(x) c("", 1)[(as.character(x) != "") + 1])
dt
#  a b c d
#1     1 b
#2 1 1   a
#3 1 1    
#4   1   c
#5 1 1 1 d

edited Feb 14, 2018 at 20:22

answered Feb 14, 2018 at 20:18

d.b

32.6k6 gold badges46 silver badges90 bronze badges

3 Comments

Rushabh Patel Over a year ago

much better its just I will get 0's instead of empty char.

IceCreamToucan Over a year ago

nlevels would be useful here

d.b Over a year ago

@Renu, unique would work even if the columns are not factor

Collectives™ on Stack Overflow

Efficient way to replace value in binary column with 1 in R

2 Answers 2

1 Comment

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related