I created a DataFrame as follows:
import spark.implicits._
import org.apache.spark.sql.functions._
val df = Seq(
(1, List(1,2,3)),
(1, List(5,7,9)),
(2, List(4,5,6)),
(2, List(7,8,9)),
(2, List(10,11,12))
).toDF("id", "list")
val df1 = df.groupBy("id").agg(collect_set($"list").as("col1"))
df1.show(false)
Then I tried to convert the WrappedArray row value to string as follows:
import org.apache.spark.sql.functions._
def arrayToString = udf((arr: collection.mutable.WrappedArray[collection.mutable.WrappedArray[String]]) => arr.flatten.mkString(", "))
val d = df1.withColumn("col1", arrayToString($"col1"))
d: org.apache.spark.sql.DataFrame = [id: int, col1: string]
scala> d.show(false)
+---+----------------------------+
|id |col1 |
+---+----------------------------+
|1 |1, 2, 3, 5, 7, 9 |
|2 |4, 5, 6, 7, 8, 9, 10, 11, 12|
+---+----------------------------+
What I really want is to generate an output like the following:
+---+----------------------------+
|id |col1 |
+---+----------------------------+
|1 |1$2$3, 5$7$ 9 |
|2 |4$5$6, 7$8$9, 10$11$12 |
+---+----------------------------+
How can I achieve this?