Input Spark Scala Dataframe Column as Vector

Question

Relatively new to scala and the Spark API kit but I have a question trying to make use of the vector assembler

http://spark.apache.org/docs/latest/ml-features.html#vectorassembler

to then make use of matrix correlations

https://spark.apache.org/docs/2.1.0/mllib-statistics.html#correlations

The dataframe column is of dtype linalg.Vector

val assembler = new VectorAssembler()

val trainwlabels3 = assembler.transform(trainwlabels2)

trainwlabels3.dtypes(0)

res90: (String, String) = (features,org.apache.spark.ml.linalg.VectorUDT@3bfc3ba7)

and yet calling this to an RDD for the statistics tool throws a mismatch error.

val data: RDD[Vector] = sc.parallelize(
  trainwlabels3("features")
) 

<console>:80: error: type mismatch;
 found   : org.apache.spark.sql.Column
 required: Seq[org.apache.spark.mllib.linalg.Vector]

Thanks in advance for any help.

user7927452 · Accepted Answer · 2017-04-26 18:55:08Z

1

You should just select:

val features = trainwlabels3.select($"features")

Convert to RDD

 val featuresRDD = features.rdd

and map:

featuresRDD.map(_.getAs[Vector]("features"))

answered Apr 26, 2017 at 18:55

user7927452

261 bronze badge

Sign up to request clarification or add additional context in comments.

Comments

Vidya · Accepted Answer · 2017-04-27 12:55:39Z

0

This should work for you:

val rddForStatistics = new VectorAssembler()
   .transform(trainwlabels2)
   .select($"features")
   .as[Vector] //turns Dataset[Row] (a.k.a DataFrame) to DataSet[Vector]
   .rdd

However, you should avoid RDDs and figure out how to do what you want with the DataFrame-based API (in the spark.ml package) because working with RDDs is all but deprecated in MLlib.

edited Apr 27, 2017 at 12:55

answered Apr 27, 2017 at 0:17

Vidya

30.4k7 gold badges48 silver badges73 bronze badges

Collectives™ on Stack Overflow

Input Spark Scala Dataframe Column as Vector

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related