I have a DataFrame orders:
+-----------------+-----------+--------------+
| Id| Order | Gender|
+-----------------+-----------+--------------+
| 1622|[101330001]| Male|
| 1622| [147678]| Male|
| 3837| [1710544]| Male|
+-----------------+-----------+--------------+
which I want to groupBy on Id and Gender and then aggregate orders. I am using org.apache.spark.sql.functions package and code looks like:
DataFrame group = orders.withColumn("orders", col("order"))
.groupBy(col("Id"), col("Gender"))
.agg(collect_list("products"));
However since column Order is of type array I get this exception because it expects a primitive type:
User class threw exception: org.apache.spark.sql.AnalysisException: No handler for Hive udf class org.apache.hadoop.hive.ql.udf.generic.GenericUDAFCollectList because: Only primitive type arguments are accepted but array<string> was passed as parameter 1
I have looked in the package and there are sort functions for arrays but no aggregate functions. Any idea how to do it? Thanks.