Spark: How to save array as two column CSV?

Question

I've got an array with predictions and labels from logistic regression which looks like this:

labelAndPreds: org.apache.spark.rdd.RDD[(Double, Double)] =  
MapPartitionsRDD[517] at map at <console>:52

scala> labelAndPreds.collect()
res2: Array[(Double, Double)] = Array((0.004106564139257318, 0.0), 
(0.3641478408865635, 0.0), (0.9999258409695498, 1.0), (0.342287288060...

How can I save it on local disk in CSV format with two columns (one for labels and one for predictions)?

Francois G · Accepted Answer · 2015-11-16 14:10:20Z

2

You can use spark-csv :

import org.apache.spark.sql.SQLContext
import org.apache.spark.sql.SQLContext.implicits._

val df = labelsAndPreds.toDF("labels", "predictions")

df.write
    .format("com.databricks.spark.csv")
    .option("header", "true")
    .save("labelsAndPreds.csv")

answered Nov 16, 2015 at 14:10

Francois G

12k1 gold badge57 silver badges61 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Spark: How to save array as two column CSV?

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related