I have an rdd and the structure of the RDD is as follows:
org.apache.spark.rdd.RDD[(String, Array[String])] = MappedRDD[40] at map at <console>:14
Here is x.take(1) looks like:
Array[(String, Array[String])] = Array((8239427349237423,Array(122641|2|2|1|1421990315711|38|6487985623452037|684|, 1229|2|1|1|1411349089424|87|462966136107937|1568|.....))
For each string in the array I want to split by "|" and take the 6th item and return it with the first element of the tuple as follows:
8239427349237423-6487985623452037
8239427349237423-4629661361079371
I started as follows:
def getValues(lines: Array[String]) {
for(line <- lines) {
line.split("|")(6)
}
I also tried following:
val b= x.map(a => (a._1, a._2.flatMap(y => y.split("|")(6))))
But that ended up giving me following:
Array[(String, Array[Char])] = Array((8239427349237423,Array(1, 2, 4, |, 9, |, 4, 1, 7, 6, |, 2, 9, 2, 7, 2, |, 7, |,....)))