0

I am using Kmeans Spark function with Scala and I need to save the Cluster Centers obtained into a CSV. This val is type: Array[DenseVector].

val clusters = KMeans.train(parsedData, numClusters, numIterations)
val centers = clusters.clusterCenters

I was trying converting centers to a RDD file and then from RDD to DF, but I get a lot of problems (e.g, import spark.implicits._ / SQLContext.implicits._ is not working and I cannot use .toDF). I was wondering if there is another way to make a CSV easier.

Any suggestion?

1 Answer 1

4

Without use of external libraries you can do that by simply writing to the file Java way.

import java.io.{ PrintWriter, File, FileOutputStream }

...

val pw = new PrintWriter(
    new File( "KMeans_centers.csv" )
)

centers
.foreach( vec =>
        pw.write( vec.toString.drop( 1 ).dropRight( 1 ) + "\n" )
    )

pw.close()

Resulting file

0.1,0.1,0.1
9.1,9.1,9.1

drop and dropRight are needed to remove [] around the converted vector.

Code and data are taken from the official example.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.