1

I have a below data frame and I want to check binary columns and change non-empty value to 1.

a <- c("","a","a","","a")
b <- c("","b","b","b","b")
c <- c("c","","","","c")
d <- c("b","a","","c","d")

dt <- data.frame(a,b,c,d)

I am able to get the solution by looping and traversing through each column. But, I want some efficient solution because my data frame is really really large and the below solution is way much slower.

My Solution-

for(i in 1:length(colnames(dt)))
{
  if(length(table(dt[,i]))==2){
  dt[which(dt[,i]!=""),i] <- 1
  }
}

Expected Output:

 a b c d
     1 b
 1 1   a
 1 1    
   1   c
 1 1 1 d

Is there a way to make it more efficient.

3
  • If you are looking at the "length" of individual cells, then you need nchar not length. Do you want to replace the empty values with NA, 0, or something else? (It would really help if you provided your expected output.) Commented Feb 14, 2018 at 20:10
  • You're code looks mostly fine. I would just suggest that length(unique(dt[, 1])) == 2) will probably be faster than table(). If, as in your sample data, your columns are already factors you could do a little better reassigning the levels. Commented Feb 14, 2018 at 20:11
  • @d.b your code doesn't check for binary columns. Commented Feb 14, 2018 at 20:12

2 Answers 2

2

Since your concerns seems to be efficiency you may want to look at packages like dplyr or data.table

library(dplyr)
mutate_all(dt, .funs = quo(if_else(n_distinct(.) <= 2L & . != "", "1", .)))

library(data.table)
setDT(dt)
dt[ , lapply(.SD, function(x) ifelse(uniqueN(x) <= 2L & x != "", 1, x))]
Sign up to request clarification or add additional context in comments.

1 Comment

If using data.table, you can use the fast uniqueN(x) in place of length(unique(x)).
2
inds = lengths(lapply(dt, unique)) == 2
dt[inds] = lapply(dt[inds], function(x) as.numeric(as.character(x) != ""))
dt
#  a b c d
#1 0 0 1 b
#2 1 1 0 a
#3 1 1 0  
#4 0 1 0 c
#5 1 1 1 d

If you want "" instead of 0

dt[inds] = lapply(dt[inds], function(x) c("", 1)[(as.character(x) != "") + 1])
dt
#  a b c d
#1     1 b
#2 1 1   a
#3 1 1    
#4   1   c
#5 1 1 1 d

3 Comments

much better its just I will get 0's instead of empty char.
nlevels would be useful here
@Renu, unique would work even if the columns are not factor

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.