I have an column of type binary. The values are 4 bytes long, and I would like to interpret them as an Int. An example DataFrame looks like this:
val df = Seq(
(Array(0x00.toByte, 0x00.toByte, 0x02.toByte, 0xe6.toByte))
).toDF("binary_value")
Where the 4 bytes in this example can be interpreted as an U32 to form the number 742. Using a UDF the value can be decoded like this:
val bytesToInt = udf((x: Array[Byte]) => BigInt(x).toInt)
df.withColumn("numerical_value", bytesToInt('binary_value))
It works, but at the cost of using a UDF and corresponding serialization / deserialization overhead. I was hoping to do something like 'binary_value.cast("array<byte>") and take it from there, or even 'binary_value.cast("int"), but Spark doesn't allow it.
Is there a way to interpret the binary column to another data type using Spark native functions?