I'm trying to get the distinct values of a single column of a DataFrame (called: df) into an Array that matches the data type of the column. This is what I've tried, but it does not work:
def distinctValues[T: ClassTag](column: String): Array[T] = {
df.select(df(column)).distinct.map {
case Row(s: T) => s
}.collect
}
The method is inside an implicit class, so calling df.distinctValues("some_col") gives me:
scala.MatchError: [ABCD] (of class org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema)
Is there an elegant way to achieve what I want that is also type safe?
I'm on Spark 1.4.1.
map(_.getAs[T](column))you'll get what you want.