1
\$\begingroup\$

I have some nominal variables encoded as integers (not ordinal), which I would like to encode as binary (not dummies nor one hot!). The following code is what I came up with (adapted from other code I found). Is this a valid/scalable approach? Thanks!

library(binaryLogic)

df <- data.frame(x1 = c(1, 1, 2, 3), x2 = c(1, 2, 3, 4))

encode_binary <- function(x, name = "binary_") {
    x2 <- as.binary(x)
    maxlen <- max(sapply(x2, length))
    x2 <- lapply(x2, function(y) {
        l <- length(y)
        if (l < maxlen) {
            y <- c(rep(0, (maxlen - l)), y)
        }
        y
    })
    d <- as.data.frame(t(as.data.frame(x2)))
    rownames(d) <- NULL
    colnames(d) <- paste0(name, 1:maxlen)
    d
}

df <- cbind(df, encode_binary(df[["x1"]], name = "binary_x1_"))
df <- cbind(df, encode_binary(df[["x2"]], name = "binary_x2_"))

df
\$\endgroup\$

1 Answer 1

1
\$\begingroup\$

If we test on larger vector your approach is quite slow:

test_vec <- 1:1e5
system.time(v1 <- encode_binary(test_vec, name = "binary_x1_"))
#  user  system elapsed 
# 22.23    0.08   22.37 

Based on this SO question I managed to write code that performs a lot faster:

encode_binary2 <- function(x, name = "binary_") {
  m <- sapply(x, function(x) rev(as.integer(intToBits(x))))
  tm <- t(m)
  # remove empty bit cols
  i <- which(colSums(tm) != 0L)[1]
  tm <- tm[, i:ncol(tm)]
  # save to data.frame
  d <- as.data.frame(tm)
  rownames(d) <- NULL
  colnames(d) <- paste0(name, 1:ncol(d))
  d
}

system.time(v2 <- encode_binary2(test_vec, name = "binary_x1_"))
# user  system elapsed 
# 0.61    0.02    0.63 

# test that results are equal:
all.equal(v1, v2)
# [1] TRUE
\$\endgroup\$

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.