2

I have a data table with a column containing hexadecimal data; which I would like to covert into binary and create new columns with the binary data. Example code,

library(data.table)
library(BMS)

# Create a data table
dt <- data.table(Z=c(1:4), 
                 HDATA=c("1234","5678","9ACB","DEF0"))
# Convert the HDATA column to binary
Bin_names <- sapply(c(15:0), function(x) paste0('C',x))
dt[,Bin_names:=hex2bin(as.character(HDATA)),]

However this gives me the following error message,

Warning message: In [.data.table(dt, , :=(Bin_names, hex2bin(as.character(HDATA))), : Supplied 76 items to be assigned to 4 items of column 'Bin_names' (72 unused)

and the modified data table looks like this,

> dt
   Z HDATA Bin_names
1: 1  1234         0
2: 2  5678         0
3: 3  9ACB         0
4: 4  DEF0         1
> 

How do I get this to give me an output that looks like this,

   Z HDATA C15 C14 C13 C12 C11 C10 C9 C8 C7 C6 C5 C4 C3 C2 C1 C0
1: 1  1234   0   0   0   1   0   0  1  0  0  0  1  1  0  1  0  0
2: 2  5678   0   1   0   1   0   0  1  1  0  1  1  1  1  0  0  0
3: 3  9ABC   1   0   0   1   1   0  1  0  1  0  1  1  1  1  0  0
4: 4  DEF0   1   1   0   1   1   1  1  0  1  1  1  1  0  0  0  0

My actual data table has about 10M rows so I am looking for a fast method to do this. Thanks,

1 Answer 1

3

Well, it seems one of the problems is that hex2bin doesn't seem to vectorize properly. It returns one vector with 16 values for each input. Really we would like to split each number separately. And it also seems the data.table := operator likes to have a list on the right hand of the assignment rather than a matrix from what I can tell. So let us define a helper function

bincols<-function(x) {
   y <- t(Vectorize(hex2bin)(as.character(x)))
   c(unname(as.data.table(y)))
}

This will create a list with 16 elements where each element is a vector of 0/1 for each value that's passed to the function. Then we can use this with your assignment command

dt[, c(Bin_names) := bincols(HDATA)]

So this seems to work. I have a feeling some of the transformations i'm doing might be unnecessary, so maybe someone with more data.table experience might suggest some improvements.

Sign up to request clarification or add additional context in comments.

3 Comments

You can just wrap Bin_names like c(Bin_names) as well.
dt[,c(Bin_names):=data.table(t(sapply(as.character(dt$HDATA),hex2bin)))] would be a minor simplification avoiding the need for Vectorize in the helper function.
Well, that basically is what Vectorize does; i think it's more readable with Vectorize.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.