1

I'm creating a Spark dataset in scala using a case class and the spark.sql({query}).as[MyCaseClass] syntax

All is fine until I try to create a dataset with one of my members defined as Array[Array[Byte]]

case class HbaseRow(
  ip: Array[Array[Byte]]
)

val hbaseDataSet = spark
   .sql("""select ip from test_data""")
   .as[HbaseRow]

Normally this works fine but with the array of byte arrays this fails.

java.lang.ClassCastException: 
org.apache.spark.sql.types.BinaryType$ cannot be cast to org.apache.spark.sql.types.ObjectType
at org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$org$apache$spark$sql$catalyst$ScalaReflection$$arrayClassFor$1.apply(ScalaReflection.scala:106)
at org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$org$apache$spark$sql$catalyst$ScalaReflection$$arrayClassFor$1.apply(ScalaReflection.scala:95)
at scala.reflect.internal.tpe.TypeConstraints$UndoLog.undo(TypeConstraints.scala:56)

the column is a spark array of IP addresses encoded as byte arrays themselves.

1 Answer 1

1

Ok, I asked this while stuck but believe I've landed on the solution. Defining my case class type with an Option seems to have done the trick.

scala> case class HbaseRow(
 |     ip: Array[Option[Array[Byte]]]
 | )
defined class HbaseRow

scala> df.select($"ip").as[HbaseRow]
res13: org.apache.spark.sql.Dataset[HbaseRow] = [ip: array<binary>]
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.