Using Spark 2.11, I've the following Dataset (read from Cassandra table):
+------------+----------------------------------------------------------+
|id |attributes |
+------------+----------------------------------------------------------+
|YH8B135U123|[{"id":1,"name":"function","score":10.0,"snippets":1}] |
+------------+----------------------------------------------------------+
This is the printSchema():
root
|-- id: string (nullable = true)
|-- attributes: string (nullable = true)
The attributes column is an array of JSON objects. I'm trying to explode it into Dataset but keep failing. I was trying to define schema as follow:
StructType type = new StructType()
.add("id", new IntegerType(), false)
.add("name", new StringType(), false)
.add("score", new FloatType(), false)
.add("snippets", new IntegerType(), false );
ArrayType schema = new ArrayType(type, false);
And provide it to from_json as follow:
df = df.withColumn("val", functions.from_json(df.col("attributes"), schema));
This fails with MatchError:
Exception in thread "main" scala.MatchError: org.apache.spark.sql.types.IntegerType@43756cb (of class org.apache.spark.sql.types.IntegerType)
What's the correct way to do that?