I have a Scala Spark DataFrame (Variable df):
id, values
"a", [0.5, 0.6]
"b", [0.1, 0.2]
...
I am trying to make use of RowMatrix to calculate pairwise cosine similarity efficiently.
final case class dataRow(id: String, values: Array[Double])
val rows = df.as[dataRow].map {
row => {
Vectors.dense(row.values)
}
}.rdd
I am having the following compilation error
Unable to find encoder for type stored in a Dataset. Primitive types (Int, String, etc) and Product types (case classes) are supported by importing spark.implicits._
Eventually, I would be able to do this (RowMatrix requires an RDD[Vector])
val mat = new RowMatrix(rows)
I have already imported spark.implicits_, what am I doing wrong?
idmatter in any way? Note thatRowMatrixwill not keep the order between the rows, if that is important useIndexedRowMatrixinstead.