3

I have a DataFrame orders:

+-----------------+-----------+--------------+
|               Id|    Order  |        Gender|
+-----------------+-----------+--------------+
|             1622|[101330001]|          Male|
|             1622|   [147678]|          Male|
|             3837|  [1710544]|          Male|
+-----------------+-----------+--------------+

which I want to groupBy on Id and Gender and then aggregate orders. I am using org.apache.spark.sql.functions package and code looks like:

DataFrame group = orders.withColumn("orders", col("order"))
                .groupBy(col("Id"), col("Gender"))
                .agg(collect_list("products"));

However since column Order is of type array I get this exception because it expects a primitive type:

User class threw exception: org.apache.spark.sql.AnalysisException: No handler for Hive udf class org.apache.hadoop.hive.ql.udf.generic.GenericUDAFCollectList because: Only primitive type arguments are accepted but array<string> was passed as parameter 1

I have looked in the package and there are sort functions for arrays but no aggregate functions. Any idea how to do it? Thanks.

1 Answer 1

1

In this case you can define your own function and register it as UDF

val userDefinedFunction = ???
val udfFunctionName = udf[U,T](userDefinedFunction)

Then instead of then pass that column inside that function so that it gets converted into primitive type and then pass it in the with Columns method.

Something like this:

val dataF:(Array[Int])=>Int=_.head

val dataUDF=udf[Int,Array[Int]](dataF)


DataFrame group = orders.withColumn("orders", dataUDF(col("order")))
                .groupBy(col("Id"), col("Gender"))
                .agg(collect_list("products"));

I hope it works !

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.