I'm learning Scala, curious how to optimize this code. What I have is an RDD loaded from Spark. It's a tab delimited dataset. I want to combine the first column with the second column, and append it as a new column to the end of the dataset, with a "-" separating the two.
For example:
column1\tcolumn2\tcolumn3
becomes
column1\tcolumn2\tcolumn3\tcolumn1-column2
val f = sc.textFile("path/to/dataset")
f.map(line => if (line.split("\t").length > 1)
line.split("\t") :+ line.split("\t")(0)+"-"+line.split("\t")(1)
else
Array[String]()).map(a => a.mkString("\t")
)
.saveAsTextFile("output/path")
splitonly once.